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Mean Field Stochastic Adaptive Control 

Arman C. Kizilkale and Peter E. Caines 



Abstract 

For noncooperative games the mean field (MF) methodology provides decentralized strategies which yield Nash 
equilibria for large population systems in the asymptotic limit of an infinite (mass) population. The MF control laws 
use only the local information of each agent on its own state and own dynamical parameters, while the mass effect is 
calculated offline using the distribution function of (i) the population's dynamical parameters, and (ii) the population's 
cost function parameters, for the infinite population case. These laws yield approximate equilibria when applied in 
the finite population. 

In this paper, these a priori information conditions are relaxed, and incrementally the cases are considered where, 
first, the agents estimate their own dynamical parameters, and, second, estimate the distribution parameter in (i) and 
(ii) above. 

An MF stochastic adaptive control (SAC) law in which each agent observes a random subset of the population 
of agents is specified, where the ratio of the cardinality of the observed set to that of the number of agents decays 
to zero as the population size tends to infinity. Each agent estimates its own dynamical parameters via the recursive 
weighted least squares (RWLS) algorithm and the distribution of the population's dynamical parameters via maximum 
likelihood estimation (MLE). Under reasonable conditions on the population dynamical parameter distribution, the 
MF-SAC Law applied by each agent results in (i) the strong consistency of the self parameter estimates and the 
strong consistency of the population distribution function parameters; (ii) the long run average L 2 stability of all 
agent systems; (iii) a (strong) e-Nash equilibrium for the population of agents for all e > 0; and (iv) the a.s. equality 
of the long run average cost and the non-adaptive cost in the population limit. 

Index Terms 

adaptive control, mean field stochastic systems, Nash equilibria, stochastic optimal control 

I. Introduction 

Overview 

The control and optimization of large-scale stochastic systems is evidently of importance due to their ubiquitous 
appearance in engineering, industrial, social and economic settings. The complexity of these problems is amplified 
by the fact that for many such systems the agents involved have conflicting objectives; hence, it is appropriate 
to consider optimization methodologies based upon individual payoffs or costs. In particular, game theory has 
been formulated to capture such individual interest seeking behaviour of the agents in many social, economic and 
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manmade systems. However, in a large-scale dynamic model, this approach results in an analytic complexity which 
is in general prohibitively high, and correspondingly leads to few substantive dynamic optimization results. 

The optimization of large-scale linear control systems wherein (i) many agents are coupled with each other via 
their individual dynamics, and (ii) the costs are in an "individual to the mass" form was presented in [1], [2] where 
the theory of mean field (MF) control (previously termed Nash Certainty Equivalence) was introduced. It is to be 
noted that the dynamic large-scale cost coupled optimization structure of [2] is motivated by a variety of scenarios, 
for instance, those analysed in [3]-[6]. 

In the literature, studies of stochastic dynamic games and team problems may be traced to the 1960s (see e.g. 
[7]-[9]) while within the optimal control context weakly interconnected systems were studied in [10], and in a two 
player noncooperative nonlinear dynamic game setting Nash equilibria were analysed in [11], where the coefficients 
for the coupling terms in the dynamics and costs are required to be small. In contrast to these studies, games with 
large populations are analyzed in [2], [12], [13]. In [2] the e-Nash equilibrium properties are analysed for a system 
of competing agents where individual control laws use local information and the average effect of all agents taken 
together, henceforth referred to as the mass. Overall, the MF methodology for noncooperative LQG games with mean 
field coupling has been developed in [1], [2], [14] providing decentralized strategies which yield Nash equilibria. 
A nonlinear extension using McKean-Vlasov Markov process models is also presented in [15]. 

The central notion of MF theory is that for general classes of large population stochastic dynamic games there 
exist game theoretic Nash equilibria for the individual agents when each applies certain competitive strategies (i.e. 
control laws) with respect to the mass effect resulting from all the agents' strategies. Here each agent is modelled 
by an individually controlled stochastic system and the systems interact through their individual cost functions and 
possibly via weak dynamical interaction. The key feedback nature of the mean field solutions is that the individual 
competitive actions against the mass, plus local feedback control, act so as to collectively reproduce that mass 
behaviour. The mass effect and associated feedback control laws are calculated offline for the infinite population 
case and yield approximate equilibria when applied in the finite population case. 

For this class of game problems, a related approach has been independently developed in [16], [17], where the 
notion of oblivious equilibrium by use of a mean field approximation for models of many firm industry dynamics 
is proposed. The asymptotic equilibrium properties of a market with a large population of agents is studied in [18]. 
Another related work is [19] where a mean field Nash equilibrium is studied subject to the assumed existence of a 
factorizing mean field distribution corresponding to the propagation of chaos for the infinite population system. The 
work in [20] presents mean field control results for a Markov Decision Problem (MDP) formulation of evolutionary 
games and teams where the basic system hypothesis is the exchangeability of the underlying random processes. 

Stochastic Adaptive Control 

For discrete time dynamics the long run average (LRA) asymptotically optimal adaptive tracking problem was 
solved in [21]; subsequently, it was shown in [22] that strongly consistent parameter estimates may be obtained by 
the use of persistently excited controls. The LRA stochastic (sample path) mean square stability for continuous time 
linear stochastic adaptive systems was established in [23]. The weighted least squares (WLS) scheme introduced 
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in [24] was shown in [25] to be convergent without stability and excitation assumption, and a LRA asymptotically 
optimal solution to the continuous time adaptive LQG control problem under controllability and observability 
assumptions using the WLS scheme for identification was subsequently obtained in [26] following [27]-[29] and 
[30]. 

MF Stochastic Adaptive Control 

It is important to note that in the non-adaptive MF theory [1], [2] each agent uses its self state and self dynamical 
parameters (i.e. its own state and its own dynamical parameters) and statistical information on the dynamical 
parameters of the population in order to generate the control action. The natural initial problem in the development 
of adaptive MF stochastic system theory is that where each agent needs to estimate its own dynamical parameters, 
while its control actions are permitted to be explicit functions of the parameter distribution of the entire population 
of competing agents [31]. Subsequent problem generalizations are such that (i) each agent also needs to estimate 
the distribution parameter of the population's dynamical parameters [32], and (ii) cost function parameters also vary 
over the population and this distribution parameter is unknown to each agent and hence needs to be estimated [33]. 
In this paper we provide a solution to the most general problem in this sequence. 

The inclusion of learning procedures for the identification by a given agent of the dynamical and cost function 
parameters of other competing agents in a stochastic dynamic system, or of the statistical distribution of these 
parameters in a mass of competing agents, introduces new features into the system theoretic MF setup. In this 
connection we note that in the economics literature the so-called "privacy of information" on dynamical parameters 
and cost function parameters is an important issue [34]-[36]. 

This paper presents an MF stochastic adaptive control (SAC) law in which each agent observes a random subset 
of the population of agents. The MF-SAC Law specifies that the ratio of the cardinality of the observed set of 
agents to that of the population of agents is chosen so that it decays to zero as the population size tends to infinity. 
When the MF-SAC Law is applied by each member of the population, each agent estimates its self dynamical 
parameters via the recursive weighted least squares (RWLS) algorithm and the distribution of the population's 
dynamical parameters via maximum likelihood estimation (MLE). 

Under reasonable conditions on the population dynamical parameter distribution, the MF-SAC Law results in 
(i) the strong consistency of the self parameter estimates and the strong consistency of the population distribution 
function parameters; (ii) the long run average L 2 stability of all agent systems; (iii) a (strong) e-Nash equilibrium 
for the population of agents for all e > 0; and (iv) the a.s. equality of the long run average cost and the non-adaptive 
cost in the population limit. 

Notation 

We denote the set of nonnegative real numbers by R + , the set of nonnegative integers by Z + , and the set of 
strictly positive integers by Z x . The norm ||-|| denotes the 2-norm of vectors and matrices, and ||:e||q = x T Qx. 
Ci, = {x : x <E C,sup t>0 ||a;(t)|| < oo} denotes the family of all bounded continuous functions, and for any 
x e Cb, ||-||oo denotes the supremum norm: ||a;||oo — sup t>0 ||a;(i)||. Tr(X) denotes the trace, and X T denotes the 
transpose of a matrix X. 
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II. Problem Formulation and MF-SAC Law Specification 

A. Review of Non-Adaptive MF Stochastic Control 

We consider a large population of N stochastic dynamic agents which (subject to independent controls) are 
stochastically independent, but which shall be cost coupled, where the individual dynamics are defined by 

dxi = [A t Xi + B.mldt + DdiVi, t > 0, 1 < i < N, (1) 

where for agent A i7 Xi G W l is the state, u, e R m is the control input, Wi e W is a standard Wiener process on a 
sufficiently large underlying probability space (SI, J- ', P) such that Wi is progressively measurable with respect to 
T Wi = {J^' i ;t > 0}. We denote the state configuration by x = (xi, ■ ■ ■ ,xn) t , and (with an abuse of notation) 
the population average state by x N = (1/N) x i- 

The long run average (LRA) cost function for the agent A\, 1 < i < N, is given by 

jfK,«_ i )=limsup^ / {\\x i -m N \\ 2 Qi + \\u i \\ 2 R }dt, (2) 

w.p.l, where we assume the cost-coupling to be of the form m N (t) = m(x N (t) + r/), rj e W 1 . The coefficients 
6j = [Ai,Bi,Qi] e & C E"(n+"»+(n+i)/2) ) w oi be called the dynamical and cost function parameters. The 
disturbance weight matrix D and the control action penalizing matrix R are constant matrices, which are assumed 
to be known by all agents, and assumed to be the same for all agents in the population. The choice of homogeneous 
parameters for D and R is only for notational brevity; the analysis is similar for varying D and R. The function 
Ui(-) is the control input of the agent Ai and u_i denotes the control inputs of the complementary set of agents 
A_ i = {A j ,j^i,l<j<N}. 

For the basic MF control problem, the following assumptions are adopted. 

Al: The disturbance processes Wi, 1 < i < N, are mutually independent and independent of the initial conditions, 
and sup^jTrSi + E||.t,(0)|| 2 ] < oo, where E Wl wJ = 1 < i < N. ■ 

A2: is an open set such that for each 9 T = [Ag,B^,Qg] € 0, [Ag, Bg] is controllable and [Q^ 2 ,Ag] is 
observable. ■ 

A3: Let the parameter set be a compact set such that C C R™(™+™+(™+ 1 )/ 2 ) ) and 

HR^HM / 9 gellQWIIII B ( <9 )ll 2 (/o 00 |l eA ' WT ll dr ) 2<iF cW < !> where A * = A — BR _1 B T n, C is the distribution 
parameter and 7 is defined in the next hypothesis. ■ 

A4: The cost-coupling is of the form: m N '(•) = m((l / N) J2k=i x k +V)>V <= ^ n > where the function m(-) is 
Lipschitz continuous on M ra with a Lipschitz constant 7 > 0, i.e. \\m(x) — m(y)\\ < j\\x — y\\ for all x, y € W 1 . ■ 

For dynamics (1) and cost function (2), a production output planning example is provided in [2] that satisfies 
the assumptions given above. Each agent's production level Xi is modeled by (1), and each agent's cost function is 
of tracking type (2), where the tracked signal is a function of price, which is an averaging function of production 
levels: m N (t) = m(x N (t) + rj), t] e R". 
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Following [2], the long run average (LRA) mean field (MF) problem is formulated in [37]. Each agent A t , 1 < 
i < N, obtains the positive definite solution to the algebraic Riccati equation 

A^n. + n.A, n,B 4 R ^i^ + q, = o. (3) 

Moreover, for a given mass tracking signal x* e C&[0, oo) the mass offset function Si(t) is generated by the 
differential equation 

-^-=Ajs i (t)-U i B i n- 1 Bjs i (t)-Q i x*(t), t>0. (4) 
Then, the optimal tracking control law [38] is given by 

u l (t) = -H- 1 Bj(U l x l (t) + s l (t)), t>0, (5) 

where Ui(-) solves inf Ji(ui,x*), which is defined below by an abuse of notation: 

1 fT 

Ji(ui, x*) = lim sup — / {\\xi - x*\\q. + \\ui\\ 2 R }dt w.p.l. 

T^co 1 JO 

Note that the procedure above assumes a given mass tracking signal x* . The equation system to calculate x* will 
be given subsequently. 

We first define the empirical distribution associated with the first N agents: 
F^(9) = jjT.Zihe^e), e M»K">+(n+i)/2) i where ^ 1 < ■ < N y i s a set of random matrices on 
with the probability distribution F^(9), parameterized by ( € P C P C W, the population dynamical and cost 
function distribution parameter such that P is compact and P is an open set. Then we employ the following 
assumption. 

A5: There exists a family of distribution functions {F^(9); 9 € ©},C <= A such that F^(-) — > F^(-) w.p.l 
weakly on 9 E & and uniformly over ( e P as N — > oo. ■ 

Each agent solves the equation system below to calculate the mass tracking signal x*(t, C), to < t < oo, offline, 
for an infinite population of agents. 

Definition 2.1: Mean Field (MF) Equation System on [io,oo): 

-^l = (Aj- UgB e n- 1 Bj)se - Qex*{r, C), 
^ = (A, - BflR-^Tnj)^ - BeR-^Jse, 

dT (6) 

S(r,C)= / x e dF c (0), 
Je 

x*(t,Q = m(x(T,() +rj), t < r < oo. 

■ 

Under A1-A4, the MF Equation System admits a unique bounded solution [2]. 

The Global Observation Control Set U^: For the optimality analysis, we first introduce the global obser- 
vation control set. The set of control inputs Ug consists of all feedback controls adapted to {6j,l < j < 
N; F c (9); F t N , t > 0}, where F t N is the cr-field generated by the set {xj{r); < t < t, 1 < j < N}. 
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The Local Observation Control Set U^: The local observation control set of agent Ai is the set of control inputs 
which consists of the feedback controls adapted to the set {6^; F^(8); J 7 ^, t > 0}. The cr-field J 7 ^ is generated 
by (xi(T); < r < t), and is the cr-field generated by the set {xj(r); < t < i, 1 <j < N}. 
Theorem 2.1: Non-Adaptive MF Stochastic Control (SC) Theorem [37, following [2]] 

Let A1-A5 hold. The MF Stochastic Control Law (5) generates a set of controls U^ F = 1 < i < N}, 1 < 
N < oo, with 

u° i (t) = -H- 1 Bj(U i x i (t) + s i (t)), t>0, (7) 

such that 

(i) the MF equations (6) have a unique solution; 

(ii) all agent system trajectories Xi, 1 < i < N, are LRA — L 2 stable w.p.l; 

(iii) {Um F ; 1 < N < oo} yields an e-Nash equilibrium for all e > 0, i.e., for all e > 0, there exists -/V(e) such 
that for all N > N(e) 

J^ulu^) e < inf J? (u^) < J»{ulu\). 

m 

Conceptually, Theorem 2. 1 may be paraphrased to say that individual competitive actions against the mass effect 
collectively produce the mass behaviour, and hence the e-Nash equilibrium is obtained. In the proof of Theorem 
2.1, the results are first established for an infinite population and then are shown to be approximated by a large 
finite population with the approximation error decaying to zero as the population size goes to infinity; it is this 
which gives the e-Nash property. 

B. MF Stochastic Adaptive Control (SAC) 

In this section we first present the identification schemes to be used by each agent under the MF Stochastic 
Adaptive Control (SAC) Law to estimate both the self dynamical parameters and the population dynamical and cost 
function distribution parameter. In other words, the analysis concerns a family of agents A\, 1 < i < N, whose 
control action at any instant is not permitted to be an explicit function of the self dynamical parameters [Aj, Bj] and 
the dynamical and cost function distribution parameter (. At time t > 0, the self dynamical parameters are estimated 
from the input-output sample path {x^r), Ui(r); < t < t} of Af, in other words, each agent Ai performs 
the identification based upon observations of its own trajectory. The distribution parameter ( is estimated from 
observations {xj(t), Mj(t); < r < t,j e Obsi(N)} on a random subset of agents Obsi(N) where \Obsi(N)\ — > 
oo, and \ObSi(N)\/N -> as N -> oo. 

The Adaptive Agent Control Set U^ A : We next define the set of control inputs U^, the admissible control set of 
an adaptive agent A it which consists of all feedback controls adapted to the set {J^.t, F°\ s , t > 0; Qj}. The cr-field 
Fi.t is generated by the agent's own trajectory and control input, {xi(r), Ui(r); < r < t}, and F° b t s , t > 0, is 
the observation cr-field generated by the trajectories and control inputs in the set Obsi(N), {xj(T),Uj(r); < r < 
t, j G Obsi(N)}. For definiteness in this paper, the identification algorithms employed are recursive weighted least 
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squares (RWLS) for the self dynamical parameter identification and maximum likelihood estimation (MLE) for the 
distribution parameter identification. However, any identification scheme which generates consistent estimates w.p. 1 
(subject to the given hypotheses) will also yield the system asymptotic equilibrium properties to be established. 

1) Self Dynamical Parameter Identification (SDPI): We denote the self estimate of the matrix 9i by 0j t — 
[A iit ,B M ,Qi], t > 0, and the estimate of ( by t > 0, where N = \Ob Si (N)\, and assume i;t and 

are generated at t > by the identification algorithm. Note that the self cost function parameter Qi is in the 
information set of agent A i7 and is therefore not to be estimated. We adopt the notation (° = (, 9° = 9 for the 
true parameters in the system. At time t > 0, agent Ai solves the RWLS equations with the measurement variable 
set as dx t with the regression vector [xj ,uj] in order to obtain the estimates [A^ t , Bj,t]. To ensure controllability 
and observability of the estimates, a projection method is used; the estimates are projected onto the compact set 

~ 1/2 

®|<2» C ®\Qi, where given Qi, [A^Bg] is controllable and [Q/ , Ag] is observable. Note that © is known to all 
agents in the system. 

2) Population Dynamical and Cost Function Distribution Parameter Identification: 

a) Population Dynamical Parameter Identification (PDPI): At t > 0, agent Ai estimates dynamical parameters 
{[Aj it , Bj :t ]; j e Obsi(N)} of the agents in its observation set, Obsi(N) . The admissible control set of agent Ai 
is U^i, consisting of observations of the trajectories and control inputs of all the agents in the set Obsi(N). Based 
upon this observation set, agent Ai obtains estimates {[Aj it , B^ t ]; j <E Obsi(N)} solving the RWLS equations 
using {dxj it ] j <E Obsi} as the measurement variable with the regression vector {[xj t ,uj t ]; j <G Obsi}. 

b) Population Cost Function Parameter Identification (PCPI): The solution to the RWLS equations with the 
inputs described above generates the estimates {[A J:t , Bj, t ]; j e Obsi(N)}. The objective at this point for each 
agent is to obtain the estimates {Q J; t; j € Obsi(N)}. The RWLS equations are then solved employing the observed 
control inputs {uj(t);j <G Obsi(N)} such that agent A t calculates {— (BjJ _1 Ruj ■(*); j e ObSi(N)} and sets as 
the measurement vector. Note that one needs the following additional assumption. 

A6': Be is invertible (and hence, necessarily, [A^Bg] is controllable) for all 9 £ &. ■ 
This rather restrictive assumption is only needed for the cost function parameter identification; therefore, PCPI will 
be given as an optional procedure in the MF-SAC Law. The observed control action is in the form (7); therefore 
arranging the variables in a certain way to be specified later, agent Ai obtains the estimates {tlj,t, s } -,(t); j G 
Obsi(N)}. Solving the algebraic Riccati equation for Qj it agent Ai obtains its estimates {Qj,t, j € Obsi(N)}. 
The symmetry of {Q^.t; j € Obsi(N)} is guaranteed. To ensure the positive definiteness of the obtained estimates 
{Qj,t, j € Obsi(N)}, [A, B] controllability, [Q 1 / 2 , A] observability, and that the requirement in A3 holds, the set 
{ A,.,. B ,... . Q,.,. j € Obsi(N)} is projected onto 0. 

c) Distribution Parameter Identification (DPI): Once the projected estimates 6*-^°' = {-^-j.t, ~&j,t, Qj,t, j € 
Obsi(N)}, N — \Obsi(N)\, are obtained, agent Ai forms the scaled log-likelihood-type function 

L^o^—iogf n MhX 

\jeOb Si (N) j 
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calculates <^°, the estimate of the distribution parameter, solving argmin^ eP L(of t N °^; (). Note that P is known 
to all agents in the system. 

Overall using the identification procedures explained above, agent A; t obtains estimates [A ijt ,B i>t ] and and 
forms the self estimated dynamical parameter vector 9j t = [A i t , B i t , Qj]. 

3) Certainty Equivalence Adaptive Control: At time t, employing agent At solves the MF Equation System 
(6) to obtain x* (r, C/t°)> i < t < oo. Then using 9 iit agent Ai solves the Riccati equation (3), obtains n ijt = H0 ijt ) 
and solves the mass offset differential equation (4) to obtain §i(t) = s(t;8 itt ,C^')- The certainty equivalence 
adaptive control for the admissible control set U^i is then given by (t) = u?(t; (9j )t , — — R~ 1 B i r t (II iit a; i (t) + 
§i(t)), t > 0. 

To obtain the main MF-SAC result stated in Theorem 2.2, we first establish the strong consistency for the family 
of estimates {0 i)t ; t > 0, 1 < i < N} and t > 0, 1 < i < N}. 

4) Control Excitation for Consistent Identification: In order to generate a consistent sequence of estimates 
(0»,t; t>0) w.p.l, a diminishing excitation is added to the adaptive control in (5) to give 

u°(t) = --R- 1 Bj(il l x l (t) + s l (t))+Z k [e l (t)-e t (k)}, te(k,k + l], k E N, 1 < i < N, (8) 

where u°(0) = 0, = logfe/Tfe; k £ Z x ), and the process (e(t),t > 0) is an M m -valued standard Wiener process 
that is independent of (iVi(t); t > 0). The sequence of random processes (e(t) — e(fc); t £ (k, k + 1], k £ N) is 
assumed to be mutually independent and all members of the set have the same probability law on (0, 1]. Since the 
sequence ; k £ N) converges to zero at a suitable rate, it will be established following [26] that the diminishing 
control excitation (£fc[e(£) — e(fc)]; t £ [0, 1), k £ N) provides sufficient excitation for almost sure consistent 
identification and decreases sufficiently rapidly enough not to affect the limiting performance of the system with 
respect to $i it = 9°,t > 0, i.e. the non-adaptive case. In other words, the asymptotic performance achieved is equal 
to the one obtained in the non-adaptive case almost surely. The diminishing control excitation (8) was introduced 
in [27], [28], and it was shown in [26] to generate strongly consistent parameter estimates via RWLS for dynamical 
parameters of the system (1) under certainty equivalence adaptive control. 

C. The MF Stochastic Adaptive Control ( SAC) Law 

We observe that the control law (8) has three terms computed from the local state information, the self dynamical 
parameter estimates and the population distribution parameter estimate. It can be written for each agent Ai, 1 < 
i < N,in the form of u?(t; 6i, t , C^°) = u l f c (t;6^ t ) + u v ° p (t; 9 i<t , C^°) + «?*(*), t > 0, where u l ° c (-) is the 
LQG feedback for the system of agent Ai based on local information; u^ op (-) is the mass offset term based on 
local information and population information received from the observed set; and uf lt (-) is the locally generated 
dither input. In this section we present the MF-SAC Law which generates the feedback control law = 
u®(t; 9t, (t 1 "), t > 0, that leads to the e-Nash equilibrium. The continuous time MF-SAC Law for agent Ai, 1 < 
i < N, with parameter 0i £ ©, 1 < i < N, is summarized in three major steps in Table I. 
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Specification of the MF-SAC Law 



For agent A;, t > 0: 
(i) Self parameter f)j identification: 

Solve the RWLS equations (9) for the dynamical parameters: 



(subscript i suppressed for clarity) 



^ T = [At,B t ], </> t T = [xj,uj], 



dv t = a{t)y t il>t[dxj - ipjvtdt], (9) 
d^t = -ait^tipfpJVtdt, 
and calculate v\ r = argmin||ut — VII: 6? r t = [vj , Qj]. (10) 

(ii) Population-parameter identification: 

(a) Solve the RWLS equations (9) for the dynamical parameters {A Jjt , Bj jt , j G Obsi(N)}. 

(b) Solve the RWLS equations (11) for the cost function parameters {flj,t, $j,t, j 6 Obsi(N)}: (subscript j suppressed for clarity) 

vj = [tit, mi 4>J = {xl,l], 

dvt = a(t)*t^[(-(B t r )- 1 R Ut ) T - ^J vt l (11) 
d^ t = ~a(t)^ t ip t t(>J^tdt, 
solve the algebraic Riccati Equation (12) for {Qj,t> j G Obsi(N)}, 

Q = -AT t nT t - n 3 - t A,- 1 + II B H 'H II, (12) 

set 0j t = [A-j,t, Bj,t, Qj,t]i and calculate 

0f t = argmin||0 T t -</,j|. (13) 

(c) Solve the MLE equation (14) at e]^" 1 = [A^,B^, Q^]> j G Obsi(N), to estimate £° via: 

\jeObs z (N) J 
= argminL^^f), N = \Ob Si (N)\, 

and solve the set of MF Equations (6) for all 9 G © generating x* (t, Cft°) , t < t < oo. 

(iii) Solve the MF Control Law Equation at Q\ \ and : 

(a) fl i;t : Solve the Riccati Equation (15) at df^: 

AT t n i]t + n iit A iit - n^tB^tR-iB^n^t + q, = o. (15) 

(b) s<(t) = s(t;^,C^°): Solve the mass offset differential equation (16) at 0f^ and C^P: 

(AT t -n i ^ t B.- 1 Bl t )s i (T)-Q i x*(T,^°), t<r <oo. (16) 



(14) 



(c) Obtain the Certainty Equivalence Adaptive Control at §Y t and : 



u°(t) = -R- 1 Bj t (ll iit x i (t) + s l (t))+( k le i (t)-e i (k)}, te(fc,fc + l], k G N. (17) 



TABLE I 
MF-SAC LAW 
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The function a(t),t > 0, in (9) is in the form of a(t) = l//(r(i)), where r(t) = \\^o 1 \\ + f*\ip{s)\ 2 ds, and 
/ G {/ : R+ — > K+, / is slowly increasing and J c °° l/(xf(x))dx < oo; c > 0}. The function /(.) is slowly 
increasing if it is increasing and satisfies /(.) > 1 and f(x 2 ) = 0(f(x)) [26]. 

Note that a positive definite solution to the Riccati equation (15) exists as the projected estimate is in the set of 
controllable and observable dynamical parameters: B\ r e © C C R«("+™+("+i)/2). 

D. Asymptotic Properties of the MF-SAC Law 

A key feature of the work in this paper is that the state aggregation integration in (6) is performed by use of 
the estimated distribution F^ No (•) in place of the true distribution F^o(-) (see (18) below). Then the central results 
of this paper are the following: under the MF-SAC Law, asymptotically as the population tends to infinity, the 
competitive best response actions of the adaptive agents with no prior information on self dynamical parameters 
and no prior statistical information on dynamical and cost function parameters of the mass give rise to a unique 
Nash equilibrium. Moreover, the resulting cost for each agent from the MF-SAC Law is asymptotically almost 
surely equal to the cost resulting from the non-adaptive MF Stochastic Control Law. 

Theorem 2.2: MF-SAC Theorem 

Let A1-A5, A7, A8 hold. Then, assume each agent A4, 1 < i < N, is such that it: 

(i) observes a random subset Obsi(N) of the total population N such that \Obsi(N)\ -» 00, \ObSi(N)\/N -> 0, 



Then, 

(a) 0j )t — > 9® w.p.l ast^oo, 1 < i < iV (strong consistency); 

(b) -> C° w.p.l as t -> 00, and N -> 00, 1 < i < N. 

The MF-SAC Law generates a set of controls Uf iF = 1 < i < N}, 1 < N < 00, such that: 

(c) all agent system trajectories Xi, 1 < i < N, are LRA — L 2 stable w.p.l; 

(d) e—Nash Property: {Umf'i 1 < N < 00} yields an e-Nash Equilibrium for all e, i.e., for all e > 0, there exists 
N(e) such that for all N > N(e) 



as N -> 00; 

(ii) estimates its own parameter 6>^ t via the RWLS (9); 

(iii) estimates the population dynamical and cost function distribution parameter Q 

(iv) computes u°(t; 9ij, C^t) Ym trie extended MF equations plus dither. 



via MLE (14); and 



(e) 




lim J t N (ultiLi) = Jim Jf w.p.l, 1 < i < N; 



(0 



Adaptive Control Performance Equals Complete Information Performance: 




lim J^{u° i ,u°_ i )= lim inf w.p.l, 1 < i < N. 
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The proof consists of the unification of the principal Theorems 3.2, 3.3, 4.2 and the Propositions 4.4 and 4.5 that 
are presented in the remaining sections. The outline of the proof is given in Appendix D. 

The technical plan of the paper is presented in three layers. The main theorem of the paper is Theorem 2.2. 
In the first layer, Propositions 4.4, 4.5 and Theorems 3.2, 3.3 and 4.2 support Theorem 2.2. In the second layer, 
Lemmas D.l, D.2, D.3, D.4, D.5, Theorem 4.3 and Proposition C.l support Proposition 4.4 whereas Lemma 3.1 
supports Theorem 3.2. In the third layer, Lemmas A.l, A. 2, A. 3 and Proposition 4.1 support Theorem 4.3. 

III. Convergence Properties of the MF-SAC Parameter Estimates 

We show that for self dynamical parameter identification, the RWLS equations for dynamical parameters (9) 
with the projection method (10) provide strongly consistent, uniformly controllable and observable estimates. The 
population dynamical and cost function distribution parameter identification is handled in three steps. First, each 
agent obtains the dynamical parameter estimates for the agents in its observation set solving the RWLS equations 
(9). It is shown that the RWLS equations (9) with the projection method (10) applied on the observed agents' 
controlled trajectories also provide strongly consistent, uniformly controllable and observable estimates. Secondly, 
another set of RWLS equations (11) are solved using the previously obtained dynamical parameter estimates as 
inputs; and finally cost function parameter estimates are obtained for the agents in the observation set (12). We 
show that the estimates obtained are positive definite and uniformly bounded by use of a projection method (13). 
Finally, we show that the MLE scheme (14) employed using these estimates provides strongly consistent population 
distribution parameter estimates. 

A. Asymptotic Convergence of the Dynamical Parameter Estimates 

The RWLS algorithm is self -convergent [25], i.e., it converges to a certain random vector almost surely irrespective 
of the control law design, but there is no guarantee that the estimated dynamical parameters will be controllable 
and observable, or the cost function estimates will be positive definite. To ensure that the sequence of estimated 
dynamical parameters are controllable, observable, uniformly bounded and the sequence of estimated cost parameters 
are positive definite and uniformly bounded we use the projection method [23]. 

For self dynamical parameter identification, the self dynamical parameter estimates with the cost function 
parameter Q, e R"(» l + 1 )/ 2 ) §J t = [A i)t , B M , Q 4 ] e E™(«+™+(«+i)/2) , t > o, (Q^ known by agent A t ) is projected 
(denoted by 8f r t in (10)) onto the compact set ©|q 4 C &\Q t , where for the given Q i7 [A ,B e ] is controllable and 

1 /2 

[Qj' ,A$] is observable. 

For the distribution parameter identification, the population dynamical parameter estimates together with the cost 
function parameter estimates are projected onto the compact subset of the set of controllable and observable 
dynamical parameters where, in addition, Qg, 9 e 0, is positive definite (for which the control law generated 
by (15) necessarily exists and is asymptotically stabilizing). 
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Lemma 3.1: Let © be a compact set such that 09 e C C E"("+"»+(n+i)/2) ) 1 < i < N _ Set = 
[Aj ;t ,B iit ,Q iit ], f > 0. Let [Ai >t ,B i>t ] T be the estimate of [A°,B°] obtained by the RWLS equations (9), and 
let Qj )t be the estimate of Q° obtained by the RWLS equations (11) and (12). Assume 0j it — > 0° w.p.l as 
i — »■ oo, 1 < i < N. Then, 9f r t = [A^, B^, Qf ^] = argmin^ g0 ||0j !t — VII (together with a co-ordinate ordering 
measurable tie breaking rule), satisfies 9f r t € w.p.l for all t > 0, and 0^ -> 0° w.p.l as i -> oo. In the SDPI 
case the corresponding result is achieved by setting Qj ;t = Q° for all t > 0. ■ 
The Lemma is proved in Appendix B. 

Now, given the projection method lemma, we show that the RWLS equations for dynamical parameters (9) and 
the RWLS equations for cost function parameters (11) generate strongly consistent estimates. 

Theorem 3.2: Let hypotheses A1-A3 hold, x* e C 6 [0,oo), and let ([A M , B i)t ]; t > 0), 1 < i < N, be the 
process of estimates obtained by the RWLS equations (9), and (Q»,t; t > 0) be the process of estimates obtained 
by (12) along the controlled trajectory ((x i t ,-«9 t ); t > 0), generated by the control (u^ t ; t > 0) according to 
the MF-SAC Law (17). Furthermore, let (B\ r t = [A^,Bf^,Q^]; t > 0), be the projected estimates according to 
Lemma 3.1. Then, 

(i) the input process given in (17) is well defined, 

(ii) [Aj.t, B it t] -> [A9, B9] w.p.l as t -> oo, 1 < i < N, 

(iii) with the optional assumption A6', Q ijt — > Q° w.p.l as t — > oo, 1 < i < N. 

m 

The theorem is proved in Appendix B using the methodology of [26], which establishes the convergence of 
the RWLS estimates (9) with diminishing excitation in the controls (17). The required uniform controllability and 
observability of the estimates is a consequence of Lemma 3.1 since 0f r e@,i>0. 

B. Asymptotic Convergence of the Population Distribution Parameter Estimates 

The MF-SAC Law specifies that the distribution parameter identification is such that each agent Ai, 1 < i < N, 
observes the control and state trajectories of a random subset of agents ObSi(N), 1 < i < N, and at each time 
iteration applies (9) to obtain the dynamical parameter estimates of each agent in its set. The MLE scheme (14) 
is then applied to these estimated parameters of the agents Obsi(N) 7 1 < i < N, for t > 0, to obtain an estimate 
of the distribution parameter. To obtain the strong consistency of the distribution parameter estimates we adopt the 
hypotheses A7 and A8 below. 

A7: There exists a bounded continuous (on x P) family of densities = {/^(0);0 e 0,£ e P} for the 
family of dynamical and cost function parameter distributions {F^(.); ( g P}. Further, the distribution function 
/f(0) is bounded away from uniformly over x P, i.e., /f(0) > 5 for some 6 > for all e © and ( e P. 
Moreover, for each j, 1 < j < p, (df^/dQ)(6) exists for all ( £ P, and is uniformly bounded on © x P, except 
possibly on a Lebesgue null set independent of £ £ P. ■ 

For (14), let /(0l 1:JV °]; () be the likelihood function of f c at 0[ 1:Ar °l 4 { A ; . I5 ; . Q ; . ./ e Ob Si (N), N = 
\Ob.3i(N)\}, and let L(0[ 1:JV °1; Q be the continuously differentiable monotonically decreasing function of /(0l 1:Ar °l; Q 
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given by the scaled log-likelihood function -{1/N) log f(6^ N ^ ; Q. 
AS: {fc(-)\ C e P} satisfies: 

E c o[log/ c (0)] = E C o [log /<,(*)] ^ C = C, 

for all C, C° € P. where (° is the true parameter. ■ 
Theorem 3.3: Let A1-A3, A7, A8 hold; let \Ob Sl (N)\ -> oo and |0&Sj(iV)|/iV -> as JV -> oo, 1 < i < TV, and 
let (C^°(Ct :iVo1 )' * ^ °)' = \Obsi(N)\, be the MLE process given by (14) along the controlled trajectories of 
the observed set of agents ((xj i t,u < j t ); t > 0, j € Obsi(N)) generated by the controls (w° t ; t > 0, j e Obsi(N)) 
(17). Then, is strongly consistent at C°, that is, lim^^ lim^^ C^ t °{0 l ^ N ° ] ) = (° w.p.l, 1 < i < N. ■ 
The proof is given in Appendix B. 

IV. The Principal Asymptotic Results 
A. Asymptotic Behaviour of the MF Equations 

The MF Equations (6) that permit the calculation of the mass tracking signal x*(t,z), t < t < oo, are 
dependent on the population distribution parameter (. Correspondingly, the MF Equations of the MF-SAC Law on 
[t, oo), t > 0, with the strongly consistent distribution parameter estimate <^°, t > 0, 1 < i < N, are given below. 

Definition 4.1: MF-SAC Equation System on [t, oo): 

= (Aj - TlgBeR-^se Qex*(r,(%), 
d ^ = {A e - BeK^Bjlle)^ - BeR^Bjse, 

(18) 



-<T,(" t °)= [ XgdF, Na {9), 



x*(r, C$) = m(x(r, +V), t < r < oo. 

■ 

Proposition 4.1: For the system (1) let A1-A4, A7, A8 hold. For agent A u 1 < i < N, let: (i) ff^.t be me 
solution to (10), be the solution to (14) in the MF-SAC Law; (ii) x*(t,($), t < t < oo, be the solution 
to the MF-SAC Equation System (18); x*(t,(°), t < t < oo, be the solution to the MF Equation System (6); 
(iii) s(t;9f^,C^°) be the solution to (16) in the MF-SAC Law; and s(i;6>°,C°) be the solution to the mass offset 
function differential equation (4). Then, 

(i) lim^oo Hindoo x*(t, C$>) = x*(t, C°) w.p.l, t<r<oo,l<i<N, 

(ii) lim^oo limt^oo s(t; = s(t; 0?, C°) w.p.l, 1 < i < TV, 

(iii) The input process given in (17) is well defined and is given at ff? r t and by 

«?(*; C#) = -R" 1 ^ (ri^,* + s(f, §r t , c^ )} + €fc [c . (t ) _ ei{k)] . 



The result is proved in Appendix C. 
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B. Asymptotic Behaviour of System Trajectories 

We show that under the hypotheses that the self dynamical parameter estimates and the population distribution 
parameter estimates converge to their true values, the trajectories of adaptive individual agents are stable in the 
L 2 — LRA sense. Moreover, these trajectories and the corresponding control actions converge to the non-adaptive 
values obtained with the true parameters. 

Recall that U^f = { u ii 1 — * — N} is the set of controls generated by the non-adaptive MF Stochastic Control 
Law, while U^f — {^i ! 1 < * < N} is the set of controls generated by the MF-SAC Law. 

Using the notation 0°'* 4 $ >T) < r < t), and C°'*(-/V ) = (C> > < r < i), let a? 4 0?'*, C°'*(A0) be 
the state trajectory of agent A it 1 < i < N, under the control law u° (*; 0»,t, Ci)t°) e ^mf^ and x i - 0°, C°) 
be the state trajectory of agent Ai under the control law u° = u® (t; 6® , (?) € M^ F , where i;t is the solution to 
(10), and is the solution to (14). 

Theorem 4.2: Let A1-A4 hold; then, the process (x^(t); t > 0), 1 < i < N, is stable in the sense that 

1 f T 

sup max lim - / \\ x° (t) \\ 2 dt < oo w.p.l. 

N >! l<i<NT^oo 1 J Q 

m 

Proof: It has been shown in Theorem 3.2 that 9 i)t -> 0" w.p.l, and in Theorem 3.3 that -> (° w.p.l as 
t — > oo and — > oo, 1 < i < iV. Moreover, it has already been shown in Proposition 4.1 that the tracking signal 
x *( T i Ct 1 ") € Cb[0, oo), and the input process is well defined. All the hypotheses in [26] are satisfied, and Theorem 
1 in [26] proves the claim. ■ 
Theorem 4.3: For the system (1), under A1-A4, A7, A8 

1 f T 

lim sup lim sup — / \\x° — x°\\ 2 dt = w.p.l, 1 < i < N. 

m 

The result is proved in Appendix C. 

C. Asymptotic Behaviour of Cost Functions 

In the population limit, the asymptotic cost of an agent performing the MF-SAC Law in a system within which 
all of the agents are adaptive is almost surely equal to the cost of an agent in a system of agents all of which 
are performing the non-adaptive MF-SC Law. This is shown in Proposition 4.4 whose proof is given in Appendix 

D. Moreover, Proposition 4.5 shows that in the population limit, the best response of an agent in a population of 
agents performing the MF-SAC Law is almost surely equal to the best response of an agent in a population of 
agents performing the non-adaptive MF-SC Law. The proof is given in Appendix D. 

Proposition 4.4: For the system (1), let A1-A4, A7, A8 hold, let u° <G U^ F , 1 < i < N, be the set of controls 
generated by the non-adaptive (&°,(°) MF Stochastic Control Law, and let € W^ F , 1 < i < N, be the set of 
controls generated by the MF-SAC Law. Then, 

lim J" (u?,u° 4 ) = lim Jf (^,u _ 4 ) w.p.l, 1 < i < N. (19) 

N— >oo JV— >oo 
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Proposition 4.5: For the system (1), under A1-A4, A7, A8, u,eW 9 ", uje Uj^ F , u° e U^ F , 1 < i < N, the 
following holds: 

lim inf J N (u i ,u°_ i )= lim inf jfK, u ,) w.p.l, 1< i < JV. (20) 

■ 

V. Simulations 

Consider a system of 400 agents where each agent is modeled by a 2 dimensional system. All agents apply the 
MF-SAC Law; each of 400 agents observes its own 20 randomly chosen agents' outputs and control inputs, as well 
as its own trajectory. Rapid convergence of the state trajectories of all agents to the steady state values can be seen 
Fig. 1 where 'x' and 'y' represent the two dimensions of each agent's state and 't' denotes time. In order to plot 
the convergence of the self identification of dynamical parameters Aj, 1 < i < N, we plot the norm trajectories of 
the estimates in Fig. 2. The symbol '*' denotes the true value of the parameter for each agent. Only 10 randomly 
chosen agents are shown in Fig. 2 for clarity of presentation. In Fig. 3, we depict each agent's estimate of the 
mean of the dynamical parameter A (i.e., the mean of the random variable A), and we display 10 randomly chosen 
agents' estimate trajectories for clarity. Again, the norm of the estimates and the true values are displayed in this 
diagram. The resulting parameter estimate is different for each agent due to the fact that each agent only observes 
20 randomly chosen agents out of a system of 400 agents. 

VI. Conclusion 

This paper presents a study of the mean field stochastic adaptive control problem where the cost functions of the 
agents in a population are coupled, and each agent estimates its own dynamical parameters based upon observations 
of its own trajectory, and furthermore estimates the distribution parameter of the population's dynamical and cost 
function parameters by observing a randomly chosen fraction of the population. This work makes a contribution to 
the mean field literature by extending the established e-Nash equilibrium results of a large population of egoistic 
agents to a large population of adaptive egoistic agents. The information requirement for each agent is kept limited 
in the sense that the distribution parameter is estimated only through an observed set of agents, where the ratio of the 
cardinality of the observed set to the number of agents in the population becomes negligible as the population size 
grows to infinity. The strong consistency of the self parameter estimates and the distribution parameter estimates, 
the stability of the all agent systems, and an e-Nash Equilibrium property are all established in the paper. 

Future research directions include: (i) investigation of the influence of various rates of observed population 
fraction decay and rates of convergence on the results in this paper, together with (ii) the extension of adaptive MF 
theory to (a) the currently developing areas of distance dependent cost function influence among agents [39], (b) 
altruist and egoist MF theory [40] and (c) problems involving partially observed systems. 
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Fig. 1. State Trajectories 




o o 

Agent # t Agent # 



Fig. 2. Self Dynamical Parameter Identification Fig. 3. Population Parameter Identification 

Appendix A 

Preparatory Lemmas on Asymptotic Dynamics and Dither Inputs 

Four basic properties to be used in the sequel are given in the following lemmas. 

Lemma A.1: Let A(u>) be an asymptotically stable random matrix on p s for all u E £1 except on a P— null set 
Af, and A t (w),t > 0, be a bounded random matrix function of t > 0. If for all u 6 ft\Af and all e > 0, there 
exists T u = T w (e) such that t > T u implies ||A t — A|| < e, i.e. A t — > A w.p.l as t — > oo, then (A f ,t > 0) is 
an exponentially stable time varying matrix w.p.l, in the sense that its fundamental matrix satisfies the estimate 
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||*i, s || < fie-P^-^ for t < s < t, where p(u) > and < < oo. 
Proof: 

Suppressing mention of we Sl\Af, whenever possible for simplicity of notation we consider, 

x(t) = Ax(t), t > 0; x(0) = x e M n , and (21) 
x a {t) = A t x a (t), t>0; x a (0)=x a eR n . (22) 

Since A is asymptotically stable, we may form the Lyapunov function V(x) = x T Ux, where II > satisfies 
nA + A T n = -Q for some Q > 0. 
Now, 

V (x{t)) =x T {t)Ilx(t) + x T (t)Ux(t) (23) 
=x T (t) [nA + A T U]x(t) (24) 
= -x l (t)Q,x(t), t>0. (25) 

Then writing 

A^n + nA f = (A t - A) T n + A T n + n(A t - A) + nA, t > o, (26) 

we see that for all lo e Q\J\f there exists sufficiently large T w such that for all t > T u , 

t Q Q 

A7n + nA t <-Q+| = ^. (27) 

Therefore, 

V a {x a {t)) = j t [x T a {t)nx a {t)] < - x J(t)^x a (t) < ( ^(n) ) {xJ{t)TLc a (t)) < 0, (28) 
which implies V a (x a {t)) < — aV a [x a (t)], where a = ^ 2 A A "' ax ^) ) ' which gives 

V a (x a (t)) < Vjxjto)^-^-^. (29) 
Now, for the fundamental matrix, we have 

*(t,t ) o = sup n = sup — . (30) 

zt o #0 M^*o II x t(jl tO \\x to \\ 
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Without loss of generality, take ||x to || = 1. Then, 

||*(Mo)||o=su Pv ^M2 (31) 

<sup(\S^V (32) 



< * snp(V(x t ))i (33) 

V M^Jmrn x t 

< jfj^ sMe-^-^Vixt^ (34) 

< I e-WW-to) sup{x J o nx to )i (35) 

< e -p(*- t ") f A ( n )"«"A 2 (36) 

< /Je-^-*-), t > t ; when p = "and fi = J^^- (37) 

■ 

Lemma A.2: Let (A t , t > 0) be a random bounded matrix sequence on (O, J", P), which converges almost surely 
to the asymptotically stable matrix A" as t ->• oo; let * t ,t be defined by ^*t,t = A°* Mo , i.e. * Mo = e A °(*~* ), 
and let ^* Mo = A t * Mo with * Wo = * to , to = I. Then the following limit holds: 

lim i / T ||($ Mo -* Mo )|| 2 dt^0 w.p.l. 

The proof is given in four steps below, 
(i) Integral Representation I T : 

For almost all u e il, we have A t (u;) — > A°(u;) as t — > oo, restricting attention to the probability 1 subset of 
f^o C on which a unique solution exists. Since 

j^tM = A°* Mo with * Wo = I, and ^*t,t„ = A t * Mo with * t>to = I, (38) 

we have 

^(*t,*„ - *t,t„) =A t # tlt0 - A°* t>to (39) 

=A t # tit0 - (A - A t )*t,t„ - A t * tit0 (40) 

=A t (* t , to - * t , t0 ) - (A - A t )* t>t0 . (41) 

Integrating we obtain, 



Jt 



(42) 
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with the initial condition <l? 



to 



1 — 1 = 0. Therefore, 



*t,t - *t,t 



and so, 



*t,to " *t,to 



(ii) 7 T = lf + lf; Convergence of if : 



7 T = lim 



/ t * t , a (A -A i )e A °( 8 - to )ds, t>t 0) 

J to 

it=i / T f * t , s {A°-K s )e A °^ds 

1 Jta Jta 



dt. 



(43) 



(44) 



Let us split the integrals in (44) as follows: 



I = ^ 



- r T -4- a t 



(45) 



where the inner integrals are defined by '•' for brevity in this definition and T u > t is a random instant whose 
value is to be determined later. 

We take the norm inside the integral in If; then by use of the Cauchy Schwarz Inequality (henceforth termed 
CS) we may bound if above as in 



If<^ r ((t-t ) [ t \\^ s e A °^\\ 2 \\A s \\ 2 ds)dt, t <t<T u 



(46) 



where A s = (A - A s ), s > 0. 

Next, we may bound ||A S || above by some Mj 1 " for t < s < T w , and we may bound ||e A °( s ~* ^|| by 
Pae-PA^s-to), for some /3 > 0. Moreover ||* M || < M%", for all t < s < t < T u , for some M%" < 00, 
by the continuity of solutions to (38). 

Then, 

limsuplf <limsu P §(Mf) 2 (m^Y f \t - t Q ) ( [ e^ *-' *^ dt = . lim ^ Kg (t ,T u ), (47) 

T^oo T^oo 1 \ A / \ 'Jta \Jt J T ^°° 1 

where k = ii 2 M T £ Mf" < oo and g(-) is a bounded continuous function of and T u . Hence for a fixed T u , 
limsupy^^ jiKg(to,Tu) = 0. Therefore If tends to as T tends to oo. 

(iii) I T = lf + lf; Convergence of if : 

For the second integral if in (45), we have 



If=\ f f * 4 . s (A°-A s ) e A °(^)d s 

1 JT^ Jt a 

~T~Jt i (t ~ / t|l * t ' 5eA ° (S " t0)||2||As||2ds ) ^ to < < t < T, 
= ^ f f (*-*o) A.!| 2 d S + (t-to) / t ||.|| 2 d S ldt=:7 2 T 1 +7 2 T 2 , 



>t JT, 

where we split the inner integral and use the (•) notation for brevity. 

Using the semi-group property of the state transition matrix, we may write & t 
T u < T. But we have sup to<s<T ^ ||*t w ,s|| =: M^ u < oo, and we have M T £ := sup to<s<T ^ ||A S ||. Therefore, 



(48) 
(49) 
(50) 



*i t„*t„.s for all t < s < 
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J 2 T i ^ y ( M A ) ( M *") J T ( (* - *o) y ( ll**,T..e A ° (s - to) || 2 d S J dt, t Q < s < T u < t < T. (51) 

Concerning J^, the random time T w is chosen so that for s > T w (increasing the value over that used in (47) 
without affecting that argument), ||A S || < e. Hence, 

$i<\<? j T (it -t ) £ W&t^^-^fds^ dt, t <T w < S <i<T. (52) 

From Lemma A.l, * t)to satisfies the bound ||*t,t || < f3 1 e~ p ' s ' ( - t ~ to \ t > to, where /3\ = f3i(w),p = p{uo). 
Finally, bounding || e A (s-t ) || by ^-Pao («-*<>) yieMs 



P 2 o 



J 21 < r 



(Mj-) 2 (Ml™) 2 fiJ T ({t-t ) e-^-^e-^^ds^ dt, 



to<s<T u <t<T. (53) 
For simplicity, in the following we use p = min p^a] and /3 = max [/3 , then, 

^n' J T (^t-to) e- 2p{t - T ^e- 2p ^^d^jdt, t <s<T u <t<T, (54) 

where «' = /3 4 (m J" ) * ) * , (55) 

II < limsup i.V'e 2 ^ + ^ + + J^) , (56) 

for a suitable constant k" independent of T w , which tends to as T — > oo. 

(iv) /J = Jji + ^22 i Convergence of ' Jj '•' Employing the hypothesis A t — > A w.p.l as i — > 00, we shall fix T w 
such that ||A S || < e 

For 7^, applying Lemma A.l for T w < s < t < T, we obtain 



lL< -ft 2 P 4 J T ((t-to) ^ e - 2p *('- fl )e- 2 ''A°(*- t o)d^ dt, t <T^<s<t<T, 



(57) 



<^e 2 /? 4 / ((t-to) [ e-W-^ds^J dt, p := mm[p*, p A o], (58) 

.f^^w-m gif^ , ,59, 

where <?(•) is a bounded continuous function. Then, lim supj^^ I22 = 0. Therefore lim supj^^ /J < lim supy^^ 7"^ 
limsup^^/^ = 0. 

Since we have established that ij — > 0, /J — > w.p. 1 as T — > 00, we obtain lim sup-j^^ I T < lim sup-j^^ lj+ 
limsup^^/J = 0. 

Hence, we have proved that 



1 f T 

lim - / ||* t t - ip t t \\ 2 dt = 0, w.p.l. (60) 
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Lemma A. 3: [26, Duncan et al. (1999)] Assume the process (e(t),t > 0) is an Revalued standard Wiener 
process that is independent of (w(t),t > 0), and assume the countable set of random processes {(e(t + k) — e(fc), t e 
(0,1]); k € N} to be mutually independent and all members of the set have the same probability law on (0, 1]. 
Then, for all /(•) e £°°[0,oo): 

fT L*J , r min[t,fc+l] 2 



lim — 



E 

fc=0 



/(r)^[ £ (r)- £ (A)]dr 



dt = 0. w.p.l. 



The proof is given in [26, Lemma 5]. 

Lemma A.4: [28, Chen and Guo (1991)] Let A e M" 2 be an asymptotically stable matrix, and let D € 
Then 



lim sup 



1 



„A(i-r) 



Ddw(r) 



eft 



/•OO 

/ Tr(e At DD T e AT ' 
Jo 



)cft. 



Proof (after [28]): 
Consider the stochastic differential equation 



dx t — Ax t d t + Ddwt, t>0. 
Since A is asymptotically stable, there exists a positive definite matrix II > such that 

iiA + A T n = -i. 

Following [28], applying the Ito formula to the Lyapunov function xjTlx t , t > 0, 

d[xj Ux t ] = xJ(UA + A T U)x t dt + Tr(IIDD T )(ft + 2xJWDdw t 
= -\\x t \\ 2 dt + Tr(nDD T )cft + 2xJUDdw t . 



(61) 

(62) 

(63) 
(64) 



Integrating (64) and using the result in Lemma 4 of Christopeit [41] to estimate the third term on the RHS of 
(63), we obtain 



cJUx t <-J ||x s || 2 ds + Tr(IIDD T )i + o ||x s || 2 dsj ^+0(1), where < rj < ^ 



(65) 



and hence, f Q * ||a; s || 2 cis = 0(t) w.p.l. 

We apply the Ito formula to the outer product x t xj , t > 0, 

d[x t xj] = x t xjA T dt + Ax t xjdt + DD T eft + T>dw t xJ + x t dwjB T . (66) 

Integrating the outer product x t xj from t = yields 

x t xj = (^J x s xjds^ A T + A x s xjds^j + (DD T )t + (Ddw s xJ) +J [x s dwJ~D T ). (67) 

A "Lyapunav integral move" yields 



rt 
Jo 



J* e A ( i - s '(DD T ) S e AT < t - s 'rf S + J* e A(t -^ QT {(Bdw T xJ) + (x T dwjT> T )} dr^j e^^ds. 



(68) 
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We deal with the second term of RHS of (68). Using Christopeit's [41] estimate again we write, 
Jo ^ (Jo { ( - BdWrX ^ + ( x rdwjT> T )} d^J e A (*- s ^s 

" (jf e ~ 2p(t ~ s) [° {jf"* 1 "" 2 *"} ' 1 + 0(1) ) rfs ) ' where ° < 11 < h 

= J* e - 2 "(*- s )o ( S V2+^ ds = o(t i/2+r, ); ^ > 
As lim^oo i J * e~ 2p ^~ s ^sds = f °° e~ 2ps ds, we take the time average limit of (68) and get 

-\ fi poo 

lim - / a; s xjds = / e As (DD T )e ATs ds, 
t^ 00 t Jo Jo 



which implies 



limsup^ / / e A(t - T) I)dw(T) 



dt = / Tr(e At DD T e A *)dt. 
/o 



(69) 
(70) 
(71) 



(72) 



Thus we obtain the desired result. 



Appendix B 



Proof of Lemma 3.1 



We drop the subscript i for clarity. By definition, when 9 t is the solution to RWLS equations (9), 9\ satisfies 
9f r = argmin^, ge ||0 t — ip\\, employing a co-ordinate ordering measurable tie breaking rule, if necessary. Since 
§ t e K"(«+™+(«+i)/2) ) §f r e © an d 6>° e 0, the definition of 0f gives ||0t-^ r || < \\0 t -9°\\. But by hypothesis, 
\\9 t - 0°|| -> w.p.l as i -> oo; therefore, 0f r -> 6»° w.p.l as t ->• oo. ■ 



Proof of Theorem 3.2 

(i) Since the solution n e <E M" 2 , 6> e R"(n+m+(n+i)/2) ) t0 tne a i ge braic Riccati equation (3) parametrized by 
9 e is a smooth function of 9 (see [42]), and since 9\ r t e 0, t > 0, 1 < i < N, II t > 0, satisfies 
U(9f r t ) < oo w.p.l for all t > 0. It is given that x* e C b [0,oo); therefore, s(t;9f r t X^ t °) < oo w.p.l for t > 
evaluated along 0f^, f > 0. Hence, u°(t;9^ t ,(^ t °) given in (17) is well defined. 

(ii) The strong consistency of the dynamical parameter estimates [A^t, Bi.t], f > 0, 1 < i < N, is shown in [26] 
under the controllability and observability of the true parameters (Al and A2 in [26]) and the uniform controllability 
and observability of the estimates (Definition 1 in [26]). In our work, the controllability and observability assumptions 

1/2 

are satisfied since [Ae,Be] is controllable and [Q e ,A#] is observable for all 9 e by A2, and moreover, the 
uniform controllability and observability of the estimates are satisfied due to Lemma 3.1. 

(iii) Dropping the subscript i for clarity we set the estimation vector vj — [Tl t ,s(t)] and the regression vector 
as ipj = [xj , 1] . The persistence of excitation is satisfied since 



liminf^A mi „ [ / tpt^Jdt] > 0. 
T^oo 1 \Jo j 
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Setting the measurement vector to be (— (Bj )~ 1 Ru t ) T and employing A6' we get [n t ,s(t)] — > [H°,s°(t)] w.p.l 
as t — > oo. Also, as shown in Part (i) [A t ,B t ] — > [A°,B°] w.p.l as i — > oo. Each estimated parameter in 
Q t = — A^II^ — n t A t + II^BtR ^B^IIt converges to its true value as t — > oo. Hence, Q t — > Q° w.p.l as 

i — > oo. 

We observe that instead of the random regularization method used in [25] and [26], we employ here the projection 
method (Lemma 3.1), which guarantees the uniform controllability and observability of the estimates. 



Proof of Theorem 3.3 

Recall that 9 [1:No] = {9j; j € Obs(N)} is an independently selected subset of 9 [1:N] of cardinality N (N), and 
Oi,l < i < N, is an independently distributed sequence with each Oi having the density f((9), and hence 9^ 1:N °^ 
possesses a density in product form. Consequently, the scaled log-likelihood function of ( at #[ 1:JV °] is given by 
WO = L = log (U 3 eObs(N) /c(*i)). = |06s(AT)|. Note that the subscript i is suppressed 

for clarity. The maximum likelihood estimate of £ given #[ 1:JV °] is then given by ( N ° — argmin^ gP L(#[ 1:JV °1; (). 

Now, it has been established in Theorem 3.2 that (#[ 1JV °'; t > 0) for each N (N), N G Zi, constitutes a strongly 
consistent estimate of 0[ 1:JV °], i.e., lim^oo I?' 1 ' oJ = 6»[ 1:W °] w.p.l. Based upon this, the proof of the theorem consists 
of an analysis of the convergence (as N — »■ oo and hence N (N) — > oo, and i — > oo) of the likelihood function 
L(0[i--No] . W i t h 0[ 1:Ar °] substituted for 6»[ 1:Ar °l and hence of the associated sequence of estimators (Cf° ; A G Zi) 
to C°. 

First we present two lemmas that will be used in the sequel for the proof of Theorem 3.3. 

Convergence of the Likelihood Functions L (#[ 1:JV °1; 
Lemma B.l: Subject to A7, A8 we have 

WO 4 L (> w °] ; c) -+ L C o(C) = -E c o log f c {6) w.p.l, 

as A — > oo uniformly over ( G P. ■ 
The proof of Lemma B.l is given later in Appendix B. 

Lemma B.2: L^o(C°) < L^o(Q for all £ G P, with equality holding if and only if ( = (°. ■ 
The proof of Lemma B.2 follows a standard argument. A typical treatment can be found in [43]. 

Convergence of the Functions L (§^' N °^ ; £j : Now P is a compact set, so it is sequentially compact [44], 
and the sequence (C^°; Nq G Zi) has a convergent subsequence (Ct' M ', M G Zi) for all i > 0, for which 
liiriM->oo liirit-s-oo Ct' M — C* <= P, in the topology of P. Further, we observe that is a Z?-measurable P-valued 
random variable. We will adopt the notation (C^ ; Ao G Zi) = (lim t _ ) . (X) Q^ ; Ao G Zi) in order to denote the 
sequence of MLE estimates indexed by the size of the population. 

We will show that L^a((*) < L^o(£ ) for any (° G P. This, together with Lemma B.2 with ( set equal to (* 
gives L^a((*) < L^o(C°) < i^o(C*). The Identifiability Condition A8 gives (° = (* w.p.l and we conclude that 
all subsequential limits of (C^ ; A G Zi) equal C° w.p.l, and hence limjv^co linit^oo Ct^ = C° w.p.l. 
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(i) To show L^o((*) < lim^oo L^o((^°)+e/3: For any e > 0, there exists an almost surely finite random integer 
Ni(w,e) so that for all M such that N M > Ni(co,e), the estimate ( Nm = lim t ^, 00 ^ M lies in a neighbourhood 
7V C (C) of C* for which \L ( o(() - L ( o(C)\ < e/3, for all C & A/" C «(C), by the continuity of L ( o(-) on P. The 
uniform continuity of L^o on P is shown as follows: pick arbitrary (,(' £ P C P such that (' G P lies in a 
coordinate neighborhood Ns (C) of ( G P. We have 

I L ( o (C) - i C o (O | = | -E c o log f c (8) + E c o log f c (9) | 

< f \\og f c (9)- log f c (9)\f c o(8)d9. 
J& 

Hence for some (" G P in the line segment {AC + (1 - A)C; A G (0, 1)}, the Mean Value Theorem yields 

|L C o(C)-L C o(C')|< J B -f^Wf'c" WHHC -CWfcWM- (73) 

But by A7, / c „(0) > for all C" G P. Then by (73), for each e > 0, there exists S € > such that for all C, C € P, 
— Cll < implies \L^o(Q — i^o (C') | < £• Hence, L^o(Q is uniformly continuous over P. 

(ii) To show limt^oo L C o(Cf°) < lim^oo :JVo] ; C,t°)+e/2, for all N > N 2 {uj, e) for some random N 2 (co, e) G 
Zi: Lemma B.l assures us that we can pick an almost surely finite random integer N 2 (u, e) G Zi so that for all 
AT > N 2 (w,e), we have |L(0[ 1:JV °1; C)~V(OI < e/3 for all C G P, where N = \Obs(N)\, N ->• oo, as AT -> oo, 
and where 6>[ 1:Ar °J = lim^oo #f :JV ° ] . But, from the continuity of L Wo (•), € Zi, we have lim^oo L($ 1:JV ° ] ; C) = 
L(0l 1:JV °]; C) for all (eP, therefore the inequality holds. 

(iii) To show Mmt^ L(9 l t 1:Na] ; Cf°) < Km^^ :JVo] ; C), V( G P: This follows from (14) since we have 
L (0[i:iVo]. ^o) < L (g[i:iVo] ; for all C G P, where lim^ L((3f :JVo] ; Cf°) = L(6^ N ^; ( N «), and lim Moo L((9f :J 
L(0l 1:JV °] ; C). 

(iv) To stow lim t ^ 00 L(^ 1:Ar ° 1 ; C) < L ( o(Q + e/3, VC G P: A gain, we employ Lemma B.l: pick an almost 
surely finite random integer N 3 (co,e) G Zi so that for all AT > N 3 (oj, e) we have |L(6>[ 1:W °]; C) - A;°(C)I < e / 3 
for all ( € P, and let $ 1:JVo] — > 9^ 1:N ^ as f -> oo. 

Combining (i)-(iv), yields 

V(C) < lim L C o(Cf°) + | < lim L(c9f^; Cf°) + | 

< lim L(^o]. c)+ & 

i— >-oo ,3 

< i c o (C) + e w.p.l for all (eP, 

for all N M > max(A r i, AT 2 , N 3 )(lj, e). 

Convergence of the Sequence of Estimators (Ct ; Ao G Zi): Evaluating the relation above at ( = (° yields 
^C°(C*) < ^C°(C°) + e w.p.l. But this expression is independent of N M , and e is arbitrary. So, L^o(C*) < L^o(C°) 
w.p.l for all ( e P. However, as stated in Lemma B.2, L^>((°) < L^o(() w.p.l for all ( e P, with equality 
holding if and only if C = C°- Therefore, L c o(C) < L ( «(( ) < L c a((*) w.p.l, implying (° = C w.p.l by the 
Identifiability Condition A8, and so all subsequential limits of (C^ ; N G Zi) = (lim^oo ; Ao G Zi) equal 
C° w.p.l, or equivalently limjv^oo lim t _ ) . 00 (J^ = C° w.p.l. ■ 
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Proof of Lemma B.l 

By A7 the family of densities = {f^(9);6 e 0, ( e P} exists for the family of dynamical and cost function 
parameter distributions {F c (.); ( e P}. Let f(9^ :N ^; (), where iV = \Obs{N)\, be the likelihood function of f c 
at <9[ 1:Ar °l and let L No (() = L(0 [1:JV ° ] ; C) be the continuously differentiable function of /(0[ 1:JV °] ; C), given by the 
scaled log-likelihood function 

L No (Q 4 i(0[^«] ; C) 4 -(l/JV o )log/(0[ 1:JV °] ; C) = -(VJVo)Iog f II /c(^) 

ye06s(JV) 

7V = \Obs(N)\, where 6>I 1:JV °] = {Of, j e (96s(7V)}. 

The random sequence L(() = (L(8^ 1:N "^; (); Nq € Zi) converges w.p.l for each £ <G P [43], where P is a 
compact set by A7. Then, in order for the almost sure convergence of L(Q to be uniform over P, it is sufficient that 
the process ((<9Ljv /<9C)(C); -Wo £ <^i) exists as a sequence of random variables which is w.p.l bounded uniformly 
over P, where Ljv (C) — £(#' 1:JV °'; C)- This may be shown as follows by the Mean Value Theorem: 

Wo(C') ± WO - l Lo (C) = L Ko , La (C) + dL j^(C')(C 0, 

where (' e P lies in an e— coordinate neighbourhood Af e (() of ( e P and £" lies on the line segment {A( + (1 — 
A)C'; A e (0,1)}. Consequently, 

lWo(C')l<lWo(C)l + 

where the differentiability of (Ljv; N e Zi) follows from its definition. Let each such A/" £ (C) C P, choosing a 
smaller e = e(£), possibly depending upon (, if necessary. Then take an open cover of the compact set P by these 
e— neighbourhoods and let {Af*(C)\ 1 < « < Af} be a finite subcover. By A7 for each j, 1 < j < p, (df^/dQ)(6; £) 
is bounded uniformly for all 6 e ©. Therefore, sup^ e p(||(9ijv /9C)(0ll; € Zi) < X. Moreover, by the 
convergence of the sequences {L % N {Q\ 1 < i < M) w.p.l and the boundedness of {\\{dL No /dC,){Q\\\ N e Zi) 
by if uniformly over ( e P, we obtain \L Ko>Lo (C')\ < e + 2ife w.p.l for all C' € f\ for all # ,A) > e) 
for some random N(lu, e) e Z x . But this shows that (L No ((); N e Zt) satisfies the Cauchy convergence criterion 
w.p.l uniformly over P. Therefore L No = L (0[ 1:JV °1;C) -> L <0 {() = -E C o log/ c (0) w.p.l, as iV -> oo uniformly 
over P, where N = \Obs(N)\, and hence as N -> oo uniformly over P. ■ 

Appendix C 

Proof of Proposition 4.1 

1) Proo/o/liniAr^oo lim t ^oo x*(r, Cf ) = x*(r, C°) vv./?.i, i < r < oo: 

Recall that x*(t, (°), t < t < oo, is the solution to the MF Equation System (6), and x*(t, (^°), t < t < oo, is 
the solution to the MF-SAC Equation System (18). Note that the subscript i is suppressed for clarity. A contraction 
mapping argument together with A1-A4 ensure the existence and uniqueness of x*(-, £°) e Cb[0, oo) (see [2]). Al- 
A4 also hold for x*(t, (^°), t < r < oo, t > 0, by Lemma 3.1; therefore, the existence and uniqueness properties 



dL 



K ,L 



HO 



HC'-CII with ||C'-CII<e, 
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also hold for x*(t, ( t °) for t > 0. Since x*(-,() is a continuous function of £ on P, and by Theorem 3.3, 

lim^ TO lim^ (X) Cf° - C° w.p.l, Iim t _ Kso sup t < T<00 \\x*(t, C t N °) - x*(t, C°)|| = O(eiW), where Cl (JV) -> 
as N ~ > oo. Therefore, 



lim lim 

N— ¥oo t— too 



x*(r,C t No )-x*(r,C ) 



W.p.l, t < T < OO. 



2) Proof of lim^oo lim^ s(t; C\ Cf°) = *(*; 0°, C°) w.p.7: 
The solution to the differential equation (4) is given by 



/oo 



where 4+^t,t = ^■*(^°)^t,t , *&t ,t = L an d ^* is generated by the MF equation system (6), and A* = 
(A — BR _1 B T II). For the certainty equivalence offset function s(t;9 t ,( t ) generated by the MF-SAC Law, we 
have 

/oo 
r, 6°, fT'*)Q(^)a;*(T, ( N °)dr, 

where |* t;to = A*(0 t )* t;to , # to>to - I. We adopt the notation * 4jT 4 r, 0°, V >*), * t , r 4 ¥(i,T,0°) and 
obtain 

/oo />oo 
*^Q(0 t )x*(r, Cf°)rfr - J *llQ{6 Q )x*{T, (°)dr . 

Adding and subtracting / t °° $~^Q(^ t )x* (r, (°)cIt and *^Q(0°)x* (r, C°)c?t, and using the triangle in- 
equality we get 

/oo 

/•OO /-CO 

x*(T,tf°)dr- *£Q0 t )x'{T,<?)dT + / ^ 

/oo 

/oo 
*^Q(0°)x*(r,C O )dr 

/OO 
^'tQ^V*^ C°)rfr =: I"'* + /< + It 

(i) Convergence of I™'* and l\: lim t _ ) . 00 1^ '* = (3(ei(A)), where ei(A) — > 0, as A — > oo, and lim^oo i| = 
follows from Lemma A.l and Part 1 of the proof. 

(ii) Convergence 2|: From the proof of Lemma A. 2, 

/oo 

/oo r pr 
J * r , 5 (A*(0 )- A*(6 s ))e A *^ T ^ds 

Q(e )x*(T,C )dr\\, 
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t < s < t < oo. Lemma A.l yields the bound ||* T , S || < /3 e p " , ( r s \ < s < r, and for the time invariant 
case, ||e A °( s ~*)|| < fiie~ PA °^ a ~ t \ < t < s. For simplicity, set p < min[p$,p^o] and let T w (e 2 ) be such that 
||A«(0°) - A*((9 S )|| < e 2 , s > T w (e 2 ). Then for T w (e 2 ) <t<s<r<oowe obtain 



A* < 



4> 



,A,(e°) T ( s -t) 



</3o/3iA e2||Q°|| y 



A.(e°)-A.(fl.) 
d*l||Q°|||| a; *(r ) C )||dT 

e-^-'^e-^'-^ds ) dr, 



(74) 



where Q° = Q((9°). The term (74) is satisfied for all arbitrarily small e 2 > for all sufficiently large t > T^(e 2 ) 
by use of the bounds ||* M || < /3 e-P^~ s \ We^^-^W < /Sie"^*" 8 ), and A = sup T ||x*(r, C° ) 1 1 - Hence, i| < 
/3o/3iA e 2 ||Q°|| /^(t - ^e'^-^dT = kc 2 , w.p.l, where k = /30/3! A 1 1 Q° 1 1 //O 2 - By Theorem 3.2 ||A*(0°) - 
— > as t — > oo; therefore, as t — > oo, e 2 — > 0. Hence, we obtain lim^oo J| = w.p.l. 



In conclusion we have shown that lim.t_ > . 00 J^'* = 0(e\(N)), and therefore lini7v->oo lim* 



>oo h =0 w.p.l. In 



addition, lim^oo 7 2 = and lim^oo Jg = 0. Therefore, lirrijv_ i . 00 ^O-t-^oo I N ' 1 = w.p.l. Hence, lim t ^ 00 ||s(t; Of 7 , C t W °)~ 
*(t; 0°, C°)|| = 0(e 1 (N)), and lim^*, lim^H^t; 0f, Cf °) - C°)ll - w.p.l. 

3) Proof of u°(t;6 pr ,(?°) = -R-^lW s(t;^r,Cf )) + & [c(t) - e(k)}: 

The solution n# £ IR™ 2 , 6* € K"(™ +m+ (" +1 )/ 2 ), to the algebraic Riccati equation (3) parametrized by 9 e 
is a smooth function of 9 (see [42]). Hence, (U(9 pr );t > 0) satisfies n((9f r ) < oo w.p.l for all t > since 
9 pr S 0, t > 0. It is shown in Part 1 of the proof that the mass signal x*(t,(^ ) G Cf,[0, oo),t < r < oo; 
therefore, s(i; <9f, Cf°) < oo w.p.l for all t > 0, evaluated along § pr , t > 0. Hence, u°(t; <9f, Cf°) is well defined. 



Proof of Theorem 4.3 

We recall the following notation and basic assumptions: 9® denotes the true dynamical parameter of agent Ai in 
that parametrizes the matrices [A i; B i? Qj] € 0, which are to be estimated by agent A i7 and 9 it = [Aj t , B i t , Qj] 
is the estimated parameter of agent Ai at time t. Note that Qj is in the information set of agent Ai, therefore does 
not need to be estimated. We set = (9f,T < s < t), the sample path of the estimated parameter matrices 
from time r to time t. The population distribution parameter denoted by (° e P, where P is the parameter set 
for Fq(-), parametrizes F^(-). Further, the estimated population distribution parameter of agent Ai is denoted as 
C/^°, and Q't^No) = {Ci's' 7 ' — s — *)> * s tne sam ple P am of the estimated distribution parameter from time 
t to t. As shown in Theorem 3.2, under A1-A3, on the probability space p s , (#j.t; t > 0) converges w.p.l to 
as f ^ oo, and by Theorem 3.3, under A7 and A8, (C/^°; t > 0) converges w.p.l to £° as t — > oo and 
iV -4- oo, 1 < i < iV. Note that for the optional PCPI, A6' also needs to be employed. In the sequel, Ag , 
will be used to denote the estimated dynamical parameters whereas ILj t denotes the solution to (15). Since the 
solution n e € M™ 2 , 9 e |i»(n+m+(n+i)/2)^ tQ ^ a ig e braic Riccati equation (3) parametrized by e 0, is a 



smooth function of (see [42]), IIa , t > 0, satisfies 



n, -n fl 



w.p.l as t -)• oo, 1 < i < N. To 



October 22, 2012 



DRAFT 



28 



establish the theorem we first observe x^(t;9°' t ,C,^' t (N )), t > 0, 1 < i < N, is the state of the system subject to 
the dithered MF Adaptive control law computed from the sum 



u^fA^O) = n l r{t-A, t ) +u p i op (t;e i , t ,C°) +uf t (t), t > 0, 1 < i < N, 



(75) 



where the control input due to the MF-SAC Law is given by 



The term 



will be decomposed into four parts, and convergence properties 



will be established for each term. We have 

r,0?) - [A e o - BeoK^BjoUeo}^ i(t,T,0^)dt, * 4 (t,t,0°) = I, 



and 



Also, in the sequel for clarity we will suppress the subscript i and adopt the notation: <I> t s = <!>(£, s, 9°, 9°'*), ^ t s — 



*{t,s,9°), A 4 A e o, B° 



B e0 , n 1 



A 



4 s (t ; eC°), A*) 4 ^(t ; ^,C°),A t 4 Ag B t 



Bg t , n t = U §t , s(t) = s(t;9 iit ,C$), £•(*) ^ x°(i;0° ! ',C°'*(^o))- Displaying the dependence of the fundamental 
matrix on the parameter estimate trajectory, we use the integral representation and by use of the Cauchy Schwarz 
Inequality (henceforth termed CS)'s Inequality we obtain 



* t)O a;(0) - * tlO a;(0) 



dt 



+ ^ f [ $ tyT B°R- 1 B T s(t)dr - [ ¥ tiT B°R- 1 B° T «(i)dT 
2 Jo Jo Jo 



dt 



+ 



+ 



*t,o / *; Ddw;(T) - * t , / *; Ddw(r) dt 
Jo Jo 11 

L*J , r min[t,k+l] L*J 

* t , T B t ^[e(T)-e(fc)]dr) - £ 

fc=0 



min[i,/c+l] 



* t , r B°a[e(r)-e(fc)]dr) 



(76) 
(77) 
(78) 

(79) 



We will show one by one that the limit supremums (limsupy^^) of if (76), if (78) and if (79) are all with 



probability 1, and the limit supremum (lim sup jy^^ limsupj^^) of 7^' T (77) is equal to with probability 1. 
(i) Convergence of if follows from Lemma A. 2. 



(ii) Convergence of I^ T (77): Adding and subtracting r, 9°, T ' t )B u RT 1 B 1 s(t;0 u ,C°) and 
*(i,T,0 o ,0 r '*)B o R- 1 B oT s(r;0 o ,C ) using Lemma A.l, Lemma A.2, and 

S up|| S (r;0 r ,Cf o )-s(T;0 o ,C O )ll <ei(AT), 



from Proposition 4.1 we get limsupy^^ I^' 7 " < 0((ei(iV)) 2 ) w.p.l, which implies 

lim sup lim sup I^' 7 — w.p. 1 . 

N—>-oo T—>-cc 
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(iii) Convergence of ij (78): We have 



7j = — 



/ *(t,O,0°,0 r '*) / *-V,0,f^,f3 r '')Ddu;(T)-*(t,0,6> ) / * _1 (t, 0, o )Ddw(r) 
Jo Jo Jo 



dt, 
(80) 



We use the notation <& t T = <I>(f, r, 8 a ,9 T ' t ), \&t r = VI>(t, r, 0°). Consider the stochastic differential equations 

dx t = A*(9 t )x t dt + Ddw(t), x(0) = xo, 

dy t = A*(6°)y t dt + T>dw{t), y(0) = y , 

where x — yo < oo by Al. The difference z t = x t — yt, satisfies 

z t = * ti0 / ^lDdw(T) - * ti0 / *; Dd W (r), * > 0. 
Jo Jo 

Alternatively, one can write dz t = A*{6 )z t dt+[A*{6 Q )-A*{6t)]ytdt, giving z t = *t,t z +/o *t, s (A*(0 )- 



A*(0 s ))y s ds. Hence we can write (80) as /J = | J* T *t,t zo + /J * M (A„,(f5 ) - A^(9 s ))y a ds 



dt, and 



use the CS Inequality to obtain 



/ * t , to zo 'dt+| / / * M (A*(0°)-A*(0 s ))t/ s ds 
Jo J Jo Jo 



Now z = x - y a = 0, since x = y ; therefore, 7j < f f Q T f Q * <J> tiS (A*(f3°) - A»(# s ))y s ds 



dt + |, / T 



dt. Let 

2 

dt =: 



T w (e 2 ) be such that t > T w (e 2 ) implies \\6 t - 6°\\ < e 2 . Then, 7j < f f Q T ' 
^3i + ^32- Following Lemma A.l we write ||* M || < /3 e~ p ^° (t ~ s) for t > s > 0. We also use the CS 
Inequality, and let e 2 — > as t — > oo. We get limsupy^^ 7^ = and limsupy^^ lj 2 = w.p.l. Therefore 
limsup T ^ co 7j = w.p.l. 
(iv) Convergence of ij (79): We have 

,■/ L*J / r min[t,k+l] 



lim sup 



(* t , T -¥ t , T )B°& [e(r) - dr 



dt. 



This term is treated by a direct application of Lemma A. 3; therefore, the limit of the time average integral 
tends to 0. 

Overall, we have shown that lim supy^^ 7f = 0, lim supj^.^ 7^' T = 0((ei(7V)) 2 ), limsupj^^ 7j = and 
limsup^^J = 0. This implies limsup T ^ co (l/T) f*\\x°(t; &>>*, C°''(JV )) - x°(t; 9°, C°)|| 2 = 0((eiW) 2 ) 
w.p.l. Consequently, lim^^ lim T ^ cx) (l/T) J^H^t; 0?'*, C°''W) - z?(t; 0?, C°)|| 2 = w.p.l, 1 < i < JV. ■ 
Proposition CI: For the system (1), let A1-A4, A7, A8 hold. Let u° e W£ F be the MF-SAC input (17) and 
it- 1 G ^Mi? be the non-adaptive MF-SC input. Then, 



lim sup lim sup — 

JV-s-oo T-s-oo ^ Jo 



-.0 „,0||2 



u" i \\ 2 dt = w.p.l, l<i<N. 



Proof: We have the term I N > T = i J T ||u° (t; 4t> ~ u ° (*5 > C°)l| 2 ^ which we separate into two parts 
as I N ' T = y J^\\-\\ 2 dt + ^ f£ ||-|| 2 dt =: 7^' T + 7 2 W,T , where T w is a random instant to be determined later. We 
will only establish limjv^oo limj^oo 7 2 ' =0 w.p.l here, as liniTv^oo limr^oo I\ ' =0 w.p.l is a simpler case 
of the same argument. 
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TV T 

Convergence of I 2 ' : We have 



T N,T 

2 T ,, 

rT 



f I \m-k^)-u°{t-,eie)\\ 2 dt = 

- R~ 1 B° T n°a;- ) (f; «?, C°) - R^B ^; 9°, (°)fdt, 



Dropping the subscript i, adopting the notation B° = B(6°), B t = B(9 t ), n° = 11(0°), II t = n(0 t ), w° = 

«°(t ; 0°, c°), «° = «°(t; <?„ Cf ),* = A*; C°),i° = * M°'<, ^N o )), 3 (t)^ 3 (t-,e°x°),m^<t-,0 t ,^ ), 

and using the CS Inequality, we obtain 

J^WK-'Bjsit; 9 U Cf°) - R- 1 B oT s(t; 0°, C°)|| 2 ^ 

n 

*21 "T ^22 • 

We set to be the random instant such that t > T u implies \\x° (t) - x° (t)\\ < e 1 (N) and \\s(t)-s(t)\\ < ei(iV). 
We obtain limsup^^ /£[' T = 0(( ei (iV)) 2 ) and lim sup^^ = 0((ei(iV)) 2 ) from Section C 2.(i), which 
implies 

lim sup lim sup I N ' T < lim sup lim sup I^ T + lim sup lim sup /^' T 

JV— s-oo T— >oo N^oo T— >oo N^oo T^co 

= w.p.l. 



2 
T 



_ jN,T T N,T 
— : J„ -+- J, 



Appendix D 

The following five lemmas will be used to prove Proposition 4.4 and Proposition 4.5. We use the notation 

m((x N )°(t; 0^ , C )) 4 m ((l/AT) £f =1 * *?, C°)+»?), m((^)°(t; ^ [1:Ar] , C [1:N] )) = m((l/N) * ku C#)+ 
77), where m(-) is defined in A4. 

Lemma D.l: Let Assumptions A1-A4 hold. For the system (1), the MF control law Ui(t;9^,(°) (7) and its 
corresponding closed-loop solution x°(t;9® ,(°) satisfy 

sup max lim sup i f (||z°(*; 9°, C°)|| 2 + 0°, C°)f ) dt < 00. (81) 

Proof: 

The same result has been shown to hold in [37] (Theorem 4.1) for control action in the form of u®(t) = 
u l ° c (t) + u pop (t) using the notation defined in (75). We are going to repeat this result here for completeness. 
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(i) limsupr^ i/ T ||a;?(t;fl?,C )|| 2 dt < K x < oo: 

||x°M°,C°)ll 2 



(0)- / e^^X^BOR-iBf^r;^ ^ )^ 
Jo 

Jo 



Using the CS Inequality, we obtain the inequality: 



limsup 



J- / ii n / . ~n l. n \ iii *J 







^M, u ,C u )|Nt< limsup 

T-s-oo J Jo 



A,(0V)t 



Xi(0) 



T-s-oo J Jo JO 



dt+ 



limsup 

T-s-oo J . 



jo Jo 



< lim sup I x + lim sup 7 2 + lim sup I 3 . 

T-s-oo T-s-oo T-s-oo 

For Al and A2 hold, we obtain limsupy^^ = w.p.l. 

It is shown in [37] (Theorem 4.1) that limsupy^^ I 2 < n\ < oo uniformly for all 9° e 0. 
Using Lemma A.4 we write 

POO 

limsup 7 3 T = 3 / Tr ( e A.(*?)(t-T) DD T e A.(*?)( t _ T )\ _ 

T^oo JO ^ ' 

We use sup e o e ©||e A *^»^ t ^ r '|| < /Je^'" 5 ) as shown in Lemma A.l and get 



limsup It, < 3\\T>\\ 2 l3 2 /2p = n 2 < oo. 

T^oo 



Therefore, 



1 /" 

sup max limsup— / \\x°(t; Of, C°)|| 2 G?i < m + n 2 = K\ < oo w.p.l. 

Ar>ll<»<^ T^co -t Jo 

(ii) limsup^^ £ J T ||ti?(*; 6>°, (°)\\ 2 dt < K 2 < oo: We have the MF Control Law 



«?(*;<??, C°) = «i oc (*;f?) + «r(*;f?.C ) 



= -R _1 B° (n^t) + *(t ; 0°, c )) , t > o. 



Also, the mass offset function is 



/oo 



(82) 
(83) 

(84) 

(85) 

(86) 
(87) 



(88) 

(89) 
(90) 

(91) 
(92) 

(93) 
M Q = 

(94) 



We employ Al and obtain M x * = sup r>t ||a;*||, M B — sup fle0 ||B0||, M n = sup ee0 ||ne||, and 
sup eGQ ||Qe||. Then we obtain 

supHsill < \\M Q \\M x .p/p ^M s , l<i<N. 

0<£& 

Using (90), and the bounds given above, we write 

sup max limsup 1/T [ \\v%{t; 0?, (°)\\ 2 dt < HR-^MsMnifi + HR^^MsM, = K 2 w.p.l. (95) 

JV^l 1 ^*^ T^oo Jo 
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Consequently, we get sup^j max^jv limsup^^ i J Q T (||a;?(t; 0?, C°)|| 2 + ||«?(t, 0?)|| 2 ) dt < K x + K 2 < 
oo. As both Ki and AT 2 are independent of 1 < i < A and N > 1, we obtain (81). 



Lemma D.2: Let Assumptions A1-A4 hold. For the system (1), the closed loop solution x°(i;6>°,C°) with the 

control law u°(t; 6°, (°) and the cost-coupling function m((x N )° (t; 9^ 1:N \ (°)) satisfy 

rT „ 2 



sup max lim sup — 

N>l 1 < i < N T->oo J Jo 



^(t;^,C )-m((^)°(t;^U )) 



We recall the definition Ji(m,x*) = limsupj^^ ^ Jq ~ x *\\Qi + ll^illU^ w.p.l, where x* <G C&[0, oo) is 
the solution to the MF Equation System (6). 
Proof: 

Using the CS Inequality we write 



^(i;^,C )-m((^) (t;^,C )) 



dt < 



:J o ||xO(M°,C°)|| 2 ^+^ |m((^)°(M [1:JV] ,C°)) 



(96) 

2 



dt 



Using Lemma D.l we get limsup T _ i . 00 Jf < 2K\, where Ai is given in (90). 
For I^' T we employ A4, and LHS of (96) can be further bounded by 



(97) 
(98) 



rN.T 



< 



27 



2 r T 



1 " 



fc=i 



dt 



dt. 



Using the CS Inequality we write 



g<±Lf 

Jo 



TN 2 



N 



£4(*;C<°) 



< T N ' T + J T 



We have I 22 = ^rj 2 . For J 21 ' using the CS Inequality again we get 



21 " TAT2 Jit 



x°(t;6l{°)\\ 2 dt. 



(99) 
(100) 

(101) 
(102) 

(103) 



We have shown in Lemma D.l that sup JV>1 maxi<i<7v limsupy^^ ^ J^Ha^i; Of, C°)|| 2 dt < K\. Therefore we 
get the bound 



A 



4 7 v. 



(104) 
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We have shown that limsupy^^ f[ < 2K 1 . Now we have shown that limsupy^^ 1^'^ < 4j 2 K 1 /N + 4-y 2 r] 2 . 
Finally we define K 3 = 2K X + A^KxjN + 4j 2 rj 2 and finish the proof: 

2 



sup max - limsup*, / 0°, C°) - m((x N )°(t; 6^ N \ C°)) 

Af>ll<*<^ T->oo J- Jo 



dt<K 3 < oo. (105) 



Lemma D.3: For the system (1) subject to A1-A5, when all agents apply the control generated by (7), the cost 
function J^m?, m° J (2) satisfies 

Jim J l N {u° i ,u _ i ) = J i {u° i ,x*) w.p.l, l<i<N. 

N— too 

Proof: 

From (2) we have the cost function 

jf(«o, u °_ i ) = limsupi / r C°) - ^ [1:JV] , C°))ll^ + C )!!^} ^- (106) 

T^oo J Jo J 

Adding and subtracting x*(t, C°), < t < T, to the first integrand on the RHS, we get 

+ limsup | ^ { 0°, C°) - x* (t, (°)) T Q (V (t, C°) - m((^)°(i; 0^ , C°))) } dt 



(107) 



+ limsup I [ T \\m{{x N f(t- e^ N \Q )) - x*(t, (°)\\ 2 Q dt, 

T^oo J Jo 



(108) 
(109) 
(HO) 



where, 



(HI) 



J°>°,**(t,C°))^ limsup / {|| a ;°(i;^,C )-^(i,C )l| 2 Q + 11^^^,011^}^. 

T^oo Jo 

It is shown in [37, Lemma 6.3] that = 0(e 2 {N)) and = o(e 2 (N)) where e 2 (N) -> as TV -> oo. Therefore, 

< Ji(«?,a;*)+0( e2 (iV)). (112) 
Adding and subtracting m((x Ar )°(t; ^I 1:Ar l , to Ji{u®,x*), and following the same steps above one obtains 



Hence, one gets 



Jf(^,^)+0(e 2 (iV)). 



lim Jf = Mu^x*) w.p.l, l<i<7V. 

iV— >oo 



(113) 



(114) 



Lemma DA: Under A1-A5, the set of controls U^ iF — 1 < i < N} is such that when Ui € is any 
control adapted to F N , 



lim inf Jf ( Ui ,u%) = lim Jf «, u^) w.p.l, 1 < * < N. 
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Proof: 

Let Ui = iii(t;9®,( a ) <= ^ be a feedback control action and Xi = Xi(t;6®,(°) be the corresponding closed 
loop solution. Then, from (2) we have the cost function 



jfK,x*) = limsup^ / {\\x l (t)-x*(t,(°)\\ 2 Q + 1^)111} dt. (115) 

T^oo 1 Jo 

Adding and subtracting o (i) = m(x t (t; 9°,(°), x^t; 6» [1:Ar] , C°)) we get 

J?{u i ,x')=lhn Sa p± [ T hxiW-m? (t)+mf (t) - x*(t,(°)\\ 2 Q + \\ Ui (t)\\ 2 R ) dt (116) 

<limsup^ f T {\\xi(t)-m^ (t)\\ 2 Q + \\^(t)\\ 2 R }dt (117) 

+ limsupl / llx^^C )-™^ (t)|||dt (118) 

+ limsup|^ T ^(i)-m^ u0 (i)) Q^ u0 (t)-a;*(i,C ))dt (119) 

< jf( Ui , U °J + limsup^ [ T \\x*(t,<*)-m? (t)\\ 2 Q dt (120) 

+ limsup|^ T (^W-m^ u0 (t)) Q - a:*(t,C )) dt (121) 

=:jf ( UijU ° i )+/ 1 Ar + 7 2 Ar . (122) 

It is shown in [37, Lemma 6.3] that l(* = o(e 2 (A)) where e 2 (N) -> oo as A -> oo. 
For If: : 

We add and subtract (m^) 4 m^) ^; 6»[ 1:A, 1 , C )) and obtain 

I? =lnnsu P I jT ^(t) - m^ u0 Q (m^ o (t) - (m w )°(t) + (m^A*) - x*(t, C°)) dt (123) 

< limsup i jf (^(i) - m^ uQ {t)j Q (m^ iu0 (t) - {m N )°{t)j dt (124) 

+ Imsup-jf f*i(*)-»»L iiu0 .(*)J Q((™ Ar )°(t)-^(t,C ))dt (125) 

(126) 

It is shown in [37, Lemma 6.4] that |7^| = <3(e 2 (A)) and it is shown in [37, Lemma 6.4] that |J$£| = 0(1/N). 
As u°(t;6*°,C°) is the optimal tracking solution to tracking signal x*(t,(°) (5), we obtain 

Mulx*)< inf jf(u l ,u _ J )+o(e 2 (A)) + O(e 2 (A)) + O(l/A). (127) 

Adding and subtracting x*(t,(°) to J/ V (tij, u°_ J, and following the same steps above one obtains 

inf jf (u,,^) < J l (u°,a;*)+o(e 2 (A)) + 0(e 2 (A)) + 0(l/A). (128) 
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Hence, limjv-voo ^meu^ ^i* { u ii u °-i) = Ji{ u i> x *) w.p.l, 1 < i < N. It is shown in Lemma D.3 that 
limAr^oo J? 1 = Ji(u",x*) w.p.l, 1 < i < N. Therefore, one gets 

lim inf J^{u l ,u a _ l )= lim w.p.l, 1 < i < N. (129) 

m 

Lemma D.5: Under the MF-SAC Law and A1-A4, A7, A8 

l-T 



lim limsup^ / ||m(( a; Ar ) O (i;^ 1:Ar ],C O ))-m(( a ; Ar ) O (t;0 1:Ar ,C[ 1:A, ]))|| 2 rft = O w.p.l, 



1 < i < N. 



Proof: We have the equation I N < T = i / T ||m((a: JV ) (t; c9l 1:JV ], C°)) - m((x N )°(t; fl 1:JV 1, C [1:JV] ))|| 2 rfi, where 
fl [i:JV] A ) 1 < ; < AT}, <?[i:A1 A |^,* 5 o < t < T, 1 < i < N}, and C [1:W1 = 0<t<T, 1<«< 

TV}. Employing A4, we get the inequality I N ' T < £ || £ £^1 *?> C°) - Eti C° '* W) f dt. 

Using the CS Inequality we get I N ' T < Ip- f$ j^2^Nj2^ =1 \\x®(t) — x®(t)\\ 2 ^dt, where we use the notation 
x?(£) = x®(t;0®X°) an d Xi(t) = Xi(t;6 i ' t ,Q' t (N)). Applying the supremum limit we get limsupy^^ I N,T < 
~n J2iLi { umsu PT^oo t fn\\ x i (*) — ^? (^)ll 2 ^}- K is shown in Theorem 4.3 that limsupy^^ ^ /^H^i (*) — 
i°(t)|| 2 dt = 0(ei(iV) 2 ) w.p.l; hence, we get limsupy^^ I N ' T — 0(ei(N) 2 ) w.p.l, which implies lim sup jy^^ 
limsup T ^ 00 i / T ||™((^) (i; t?^, C )) - m((x N )°(t; 0^,C^))\\ 2 dt = w.p.l. ■ 

Proof of Proposition 4.4 

The cost function (19) is repeated here: 

jf r («?,ti» i )=limsu P i / T {||x°M^C°^ 

T^oo J Jo 

where §^ = {0?'*, 1 < i < N} and C [1:JV] = {C°'*(-Nb), 1 < i < N}. We expand the term as 

i N - x = { ||*?(f ; «?■',<?•* WO) - *?(*; A? , C°) 

+ x l (t;^,C )-m((x w ) (i;^ [1:Arl ,C )) 
+ m((^)^i;c9[ 1 ^U ))-m((^)^i;^,C [1:jV] ))|| 2 3 

+ ||u?(t; <9 iit , C#) - «?(*; 0?, C°) + «?(*; 0°, C°)ll^}^ 
In the sequel we adopt the notation a;? = a$(t; 0?, C°), x? = &?(t; 0°'', Cf'*(^o)), (m^) = m((a; JV ) (t; 0l 1:Ar J , C )), 



October 22, 2012 



DRAFT 



36 



(m N f 4 m((x N ) O (t;0^ N \C ll:N] )), u? = u?(t;0?,C°), u? = u?(t; 0?, <°), and get the inequality 
^(«?,u^)<Iimsupi [ T \\x^-x%dt 

T-s-co J Jo 

+ limsup ^ [ T \\x° - (m N )°\\ 2 Q dt 

T^oo J Jo 

+ limsup ^ [ T \\(m N )°-(m N )°\\ 2 Q dt 

T^oo J Jo 

+ limsup ?!M / T (4? - x°) T (*J - (m^)°) A 

T^co 1 JO V 7 

+ limsup / T (4° - x°) T ((m")° - (m")°) A 

T^oo 1 Jo V 7 

+ l imS up / T ( x o _ {mN) oy {1%N) 0\ dt 

+ limsup^ / ||fi? + limsup ^ / 

T^co J Jo T^co J Jo 

+ lim sup & [ T (u° - U °) T ( u °) dt 

T^oo J Jo 



(130) 



N T T 1 N T 

< lim sup /j ' + lim sup 7 2 + lim sup 7 3 ' 

T^oo T^oo T^oo 

+ lim sup 7 / J V ' T + lim sup 7;^' T + lim sup Iq' T 

T^co T^oo T^oo 

+ lim sup I^' T + lim sup 7 J + lim sup 7(^' T . 

T^oo T^co T^oo 



(i) Convergence of I I ' ; We show in Theorem 3.2 that §i(t) — » 6*° w.p.l as t — » oo, 1 < i < N, and in Theorem 

i,t 



3.3 that C/t° — ► C° w.p.l, as i -> oo and N ^ oo, 1 < i < N; therefore the hypotheses for Theorem 4.3 are 



satisfied and limsup T _ i . 00 I^' T — 0(ei(N) 2 ) w.p.l. 

(ii) 7j <& 7g\- l£ + l£ equals to the the non-adaptive MF cost function; i.e., J i (M?,u°_ i ) = lim supy^^ (/J + 7j) 
w.p.l. 

(iii) Convergence of 7 3 ' : We have 

jN,T < IIQill [ T \\i^N^(^ _ f^JV^M^ _. no. || r iV,T 



/ IK^) ^) - (^)°(t)|| 2 d£ =: IIQ.H^' 
Jo 



From Lemma D.5 we have limsupj^^ 7 31 ' = 0(e\{N) 2 ). Therefore, 

limsup7 3 v ' T = limsup||Qi||73^ T = O^N) 2 ) w.p.l. 

T->oo T->oo 



N T 1 

(iv) Convergence of 7 4 ' : We have 

tN,t _ 2||Qj 

J 4 — ^ 
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Applying the CS Inequality we obtain 



r N,T 



<2IIQ.II 



T 



1/2 



\x°(t)-x°(t)\\ 2 dt 



1/2 



dt 



=: 2||Q i ||J# T x J 4 T 2 . 

We prove in Lemma D.2 that limsup-j^^ lj 2 < ^3 w.p.l. It is proved in Theorem 3.2 that 0j(t) — > 9® w.p.l 
as i — > oo and — > C° w.p.l as i — > oo and TV — > oo. Hence, we get lim supy^^ I^{ T = 0(ei(N)) w.p.l. 
Therefore, 



limsup/^' 71 < 2||Q i ||(limsup/4j' J )(limsup/4 2 ) 

T-s-oo T->oo T^oo 

= 0(e 1 (N)). 



Hence, limsup^^ if ' T = 0{e x {N)). 



(v) Convergence of I^' T : We have the equation 



T N,T _ 



2||Qi 



dt. 



Applying the CS Inequality we obtain 



1/2 



|x°(t)-x°(i)|| 2 dt 



1/2 



=:2||Q 4 ||/^ x/. 



ATT „ T N,T 



dt 



',2 



We have shown in Theorem 3.2 that 9i(t) — > 0° w.p.l as i — > oo, and ^ t ° — > £° as t — > oo and A — > oo w.p.l. 



Hence, we get limsupj^^ 7^' T = 0(ei(N)) w.p.l. The convergence of J52 J was shown as limsupy^^ 
0(ei(N)) w.p.l in Lemma D.5. Therefore, 

lim sup I^' T < 2||Qi||(limsup/^' T )(limsup/^' T ) 

T^oo T^oo T->oo 

= 0(£i(A) 2 ). 

N,T 



N,T 



T N,T 



Hence, limsup^^ J5 ' J = 0(ei(A) 2 ). 

TV T" 

(vi) Convergence of I 6 ' : We have the equation 



T N.T 



2||Qi 



jf (*?(t) - (^)°(t)) T ((m w )°(t) - (m N )°(t)) dt. 
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Applying the CS Inequality we obtain 

/f T <2||QJ (^f o T \\x°(t)-(m N )\t) 

(^l T \\(rn N )°(t)-(rh»)°(t) 
=: 2||Q i ||/£ x I 6 N 2 T . 



1/2 



dt 



1/2 



dt 



Using Lemma D.2, we get limsupy^^ Jgj < ^3 w.p.l. The convergence of Iq 2 ' T was shown as 



limsup/g 2 = 0(a(N)) w.p.l 

T^oo 



in Lemma D.5. Therefore, 



lim sup I 6 ' < 2||Q i ||(limsup/ 61 ) x (lim sup / 62 ' ) 

T^oo T^oo T-s-oo 

= 0(ei(iV)) w.p.l. 



Hence, lim supy^^ I^' T = 0(ei(N)) w.p.l. 
(vii) Convergence of I 7 ' : We can bound I 7 ' from above as 



i?> T <^[\m)-<{t)\\ 2 dt=-. iiRii/ 7 r- 



From Proposition C.l we get limsupy^^ /^' T = 0(e 1 (N) 2 ) w.p.l. Therefore, 

limsup/^^ = lim sup 1 1 R| | /^' T = 0(ei(N) 2 ) w.p.l. 



(viii) Convergence of I 9 ' : We have the equation 

jN,T = mi ^ (fi?(t) _ M (t)) T (u (t)) ^ 

Applying the CS Inequality we obtain 



I* T < 2 



IQill (^£\\u°(t)-u°(t)\\ 2 dt^ (^£\\u°(t)\\ 2 dt 



1/2 



=:2||Q i ||J 9 ^ T xj; 



'92- 



It is shown in Proposition C. 1 that lim sup^—j-QQ Iq\ — ^(^i(-^O) w.p.l. We obtain lim sup^ > ~ I 92 — -^2 w.p.l 
as shown in Lemma D.l. Therefore, 



limsup/(^' T < 2||Q i ||(limsup/gj' J ) x (limsup/g 2 ) 



r N < T \ 



0(ei(iV)) w.p.l. 



Hence, lim supj^^ I 9 ' T — 0{e\{N)) w.p.l. 
Overall we have shown that 



lim sup J*- 2, < limsup(/ 2 T + I 8 T ) + 0(ei(JV)) w.p.l. 



October 22, 2012 



DRAFT 



39 



Using the same decomposition technique applied in (130) we also show that 

< Jf^-J + OM^O) w.p.i. 

Consequently, 



lim J^{ulu%)= lim Ji v «,^)w.p.l,l<*<iV. 

iV— >oo iv— »oo 



Proof of Proposition 4.5 

Let Mi = iii(t;9®,( a ) G be a feedback control action and = £°) be the corresponding closed 

loop solution. LHS of (20) is written as 



■h = lim sup - 



Bi (t)-m£ (t) 



i(*)ll*} 



di, 



where o (i) = m(a; 4 (i; 6>°, C°), ar^C*; # 1:Ar , C )). By adding and subtracting 

_ o 4 ro(a;?(i; (9?, C°), ^(i; [1:JV] , C [1:JV] )) to the integrand, we get 



jf( Ui ,M°_ i )=limsupi /" T {\\ Xi (t) -mf _ (t)+mf r _ (t) - m? (t)f Qi + IM*)II« }dt. (131) 



Expanding (131) , we get 



^(iHjU^i) = limsup 



.„ (t) 



dt 



™| -c (*)""»! o (*) 



di 



2 
T 

m 



; 4iu0 _ 4 (*))dt+^ T ||«i(*)||^} 
=:limsup{^ T + /f T + /f T + /J} 

T^co 



Jf(ui, u°_ J < ]imaup{I 1 ' + ij} + limsup I 2 

T^oo T^oo 

+ lim sup I^' T w.p. 1 . 

T^oo 



We have limsup T ^. (X) {7 1 ' + ij} = J™ {ui, u°_ f ); therefore, 



Jj (Mi,u_j)<Jj (tii, u_J + limsup/ 2 ' +limsup/ 3 

T^oo T^oo 



N.T 



w.p.l. 



NT 

(i) Convergence of I 2 ' : Lemma D.5 states that 



lim sup la' = 0((ei(JV)r) w.p.l. 

T^oo 



(132) 



(133) 
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(ii) Convergence of I 3 ' : Applying the CS Inequality we obtain, 



^ ,T <2||Q i | 



T 



1/2 



dt 



111 



N 



(*) - m 



N 

L.„<> 



(*) 



1/2 



=:2||Q i ||j£ T xJ& r . 



NT 

Using Lemma D.2 we obtain lim supj^^ 7 31 ' < A 4 w.p.l and using Lemma D.5, we get 



limsup/ 3 ^ T = 0(ei(JV)) w.p.l. 



Therefore, 



lim sup I 3 ' < 2||Qi||(limsup7 31 ' ) x (limsup/ 31 ' ) 

T^co T^oo T^oo 

= 0(e 1 (N)). 



Hence, limsupj^^ I 3 ' T = 0(ei(A)). 

Repeating (133) here for ease of reference we see that 

Jfte.ull.i) < Jf (Ui.ti^) + limsu P (/f T + J^ T ) w.p.l, 

where limsup^^ (lf' T + 7f' T ) = 0(ei(A)). Hence, Jf (14,1*°,) < Jf («<, u° J+O^A)) w.p.l. Applying 
the decomposition technique in (132) for J^{ui, u°_ J, one can also get J/* r (tij,u!|_ i ) < j/^(itj, vP_j) + 0(ei(N)) 
w.p.l, which implies the claim that liniTv^oo Jf* ( u i, u °-i) = liniTv^oo J™ {vn, u°_ ,) w.p.l, 1 < i < AT. Therefore, 



lim inf jf r (u i ,u A= lim inf J^( Ui ,u°A w.p.l, 1< i < N. 



(134) 



Proof of Theorem 2.2 

First, it is evident that Theorem 3.2 gives (a), Theorem 3.3 gives (b), and Theorem 4.2 gives (c). Second, using 
a technique similar to that used in [37, Theorem 6.2], it is shown in Proposition 4.4 that 

Jf(^,£°_j< J^X-J + OMAO) w.p.l, (135) 

where ei(A) — > as A — »■ 00. Then, Lemma D.4 gives 

Jf (u^uO,) < inf Jf (u l ,u a _ l )+o(e 2 (N)) + 0(e 2 (N)) + 0(l/N) w.p.l, 1 < i < A, (136) 

where 62(A) — > as A — » 00. Finally, Proposition 4.5 states that 

inf Jf(u 4 ,u° 4 )< inf Jf (u l ,u° J ) + 0(e 1 (A)), (137) 



w.p.l, 1 < i < A, where ei(A) -> as A -> 00. 
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Equations (135), (136), (137) together then give the first inequality in 

Jf («?, - e(AT) < inf J^(«i,«°i) < 

w.p.l, 1 < i < N, while the second is immediate, where e(N) = 0(ei(AT)) + 0(e 2 {N)) + o(e 2 (N)) + 0(1/N). 
This concludes the proof for (d). 

Claim (e) restates Proposition 4.4, and claim (f) is a consequence of the e-Nash property (d), with the existence 
of the limits given by (19). ■ 
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